Impact evaluation design
The AGEP evaluation was based on a multi-arm randomized cluster design implemented in ten sites, half urban and half rural, in four provinces in Zambia. Study provinces and the number of sites per province were selected purposefully, based on the feasibility of operating the program in the context of conducting an embedded randomized impact evaluation, as well as through discussions with the donor regarding the diversity of the target populations. Sites within provinces and urban/rural stratification, were randomly selected from a larger number of possible locations. Sites in rural areas contained multiple contiguous or proximal villages or chiefdoms, while in urban sites the program was implemented in one or more high-density housing compounds. The study adhered to CONSORT guidelines.
Interventions and study arms
AGEP included three intervention components: The first was weekly girls group meetings that were facilitated by a female mentor from the community. Mentors were aged 20–35, selected via an interview process and participated in an initial 10-day training on the content and mentor skills, a five-day refresher training midway through the intervention and monthly supervision meetings with program staff. Groups were segmented by age and marital status, and included curricula-guided sessions on sexual and reproductive health, HIV, life skills and financial education. The weekly girls groups, known as ‘safe spaces,’ were considered the core component of AGEP as there was a growing literature that girls meeting regularly for sessions facilitated by a female mentor, combining sexual and reproductive health, life skills and economic-strengthening components—had been successful in several countries in sub-Saharan Africa for improving longer-term health and economic outcomes [11, 13, 14]. A second component of the intervention was the provision of a health voucher to girls, which could be used at contracted public and private facilities for free general wellness and sexual and reproductive health services. A third component of AGEP was the provision of an adolescent-friendly savings account, with features such as low fees, low opening balance (~$0.25) and the ability for a minor to transact on the account. The design of the bank accounts was developed in partnership with the National Savings and Credit Bank of Zambia. Further details of AGEP’s interventions are described elsewhere . The study included three intervention arms to assess the effect of the weekly meetings alone, as well as the added effect of the “add-on components”: Arm 1 included weekly meetings only, Arm 2 included weekly meetings and the health voucher, and Arm 3 included weekly meetings, the health voucher and the savings account. A fourth arm was a control in which there were no interventions offered.
Sample size and power analysis
A minimum detectable effect (MDE) approach was used to conduct power analysis as limited baseline evidence was available to support a sample size estimation approach. MDEs for the study’s longer-term outcomes comparing each arm against the control were estimated using Optimal Design Plus Software Version 3.0 for a multi-site cluster randomized trial . The MDE estimation was stratified by urban/rural location and by younger and older cohorts (10–14 and 15–19). Endline (four years after the baseline) estimates of study indicators for the control arm were obtained from the 2007 Zambia DHS . Power was set at 0.80, significance level at 0.05, and the effect size variability was fixed at 0.00, as analyses controlled for site fixed effects. Given the budget available for fieldwork and a total of ten study sites, MDEs for different combinations of numbers of clusters and respondents per cluster were compared and it was determined that the most efficient sample size was 40 clusters per study arm (four clusters per arm per site) and 20 participants per cluster, 10 per age cohort, for a total sample at endline of 3200. A cluster was defined as a Census Supervisory Area (CSA), as delineated by the Zambia Central Statistical Office.Footnote 2 The MDEs are reported elsewhere .
Randomization and study participants
Random selection of clusters and assignment to study arms was conducted at the site level via a public lottery attended by key community stakeholders. At the lottery, one bowl contained slips of papers that had one CSA code per slip and another bowl had sixteen slips of paper, each with one arm listed (four per arm). Representatives from the community came to the front, selected one CSA slip and then one study arm slip. The process was repeated 16 times, completing CSA selection and study arm assignment. Following the public lottery, a household listing was conducted in 2013 in all selected clusters. In service of the AGEP goal of reaching the most vulnerable girls, while also creating conditions for conducting a rigorous cluster randomized evaluation, eligible girls were identified for recruitment into the AGEP intervention through the household listing. A vulnerability score was assigned using an ordinary least squares (OLS) regression to estimate the number of grades behind for age as the dependent variable; independent variables in the regression model included age, not in school, ever married and having at least one child. The estimated residual of the regression was then used to represent vulnerability, with higher residuals indicating higher vulnerability. Both married and unmarried girls were ranked from highest to lowest vulnerability in each site and the 2000 most vulnerable in each site were tagged. Of tagged girls, those living in AGEP intervention clusters were invited to participate in the program, while girls living in control clusters were not issued an invitation.
The lists of most vulnerable girls in each site, excluding ever-married girls, constituted the sampling frame for the research. The baseline target sample was inflated to account for conservative estimates of attrition and refusals, in particular among the older cohort. Up to 17 girls aged 10–14 and up to 23 girls aged 15–19 within each cluster, stratified by two age groups (10–14 and 15–19 years old), were randomly selected for participation in the research.Footnote 3
The AGEP research protocol was reviewed and approved by the Population Council Institutional Review Board (PC-IRB), the University of Zambia’s Research Ethics Committee (UNZA-REC) and the Zambian Ministry of Health. Informed consent was obtained from all participants. Parental or guardian consent was also obtained for participants under 18 years old.Footnote 4
The adolescent survey instruments were designed to measure both girls’ assets – including gender norms, self-efficacy, HIV and sexual and reproductive health (SRH) knowledge, financial literacy and savings behavior – as well as longer-term outcomes – including schooling attainment, age of sexual debut, first birth and marriage. Attendance data for each girl, including date of attendance and session topic, were collected on mobile phones by group mentors via Open Data Kit (ODK). Attendance data collected from ODK were then uploaded into a cloud-based database hosted on a cloud-based platform.
This paper focuses on longer-term educational and fertility outcomes as well as a subset of hypothesized mediating indicators representing girls’ assets within each domain (social, health and economic). Additional file 1 provides detailed definitions for all outcomes and mediating indicators. Indicators were selected based on their theorized influence on adolescent-girl outcomes, while also being directly linked to the intervention activities and curriculum topics. The following eight mediating indicators reflecting girls’ social, health and economic assets were evaluated among all girls: (1) Social Assets – a) self-efficacy, measured as a score with 0–10 range based on the Generalized Self-Efficacy Scale ; b) whether the girl had a safe space in the community to meet with friends, c) positive gender attitudes, with questions based on measures developed in previous studies [11, 18]. This indicator ranged from 0 to 7; and d) non-acceptability of intimate partner violence (IPV), based on Demographic and Health Survey (DHS) questions, measured as a binary variable that is equal to 1 if the respondent did not agree that a husband is justified in hitting his wife for any of five specific reasons; (2) Economic Assets – a) financial literacy, measured based on previous studies [11, 18, 19] as a score ranging from 0 to 9; b) a dichotomous variable of whether the girl saved any money in the past year; and (3) Health Assets - a) knowledge of fertile period and contraceptive methods based on DHS questions, measured as a score with 0–11 range, and b) HIV/AIDS knowledge based on DHS-related questions, measured as a score with 0–11 range. The following two mediating indicators reflecting sexual behavior were evaluated among girls aged 15 years and older who had initiated sex: (1) used condom at first sex; and (2) reported having had transactional sex. The mediating indicators, while reflecting the original intent of the proposed mediating outcomes in the initial study protocol submitted for ethical review, were revised after baseline to capture the most valid and reliable measures still in line with the theoretical framework (published elsewhere ).
Two educational outcomes were evaluated among all girls: (1) completed the last grade of primary school (grade 7); and (2) completed grade 9. The following four fertility longer-term outcomes were evaluated among girls aged 15 years and older: (1) ever had sex; (2) ever been pregnant; (3) ever given birth; and (4) ever been married. All of these were identified as primary long-term outcomes in the initial study protocol submitted for ethical review.
This paper uses the sample of girls interviewed at baseline, Round 3 (end of the intervention) and Round 5 (2 years after the end of the intervention). Baseline data were collected between July 2013 and February 2014. Round 3 was conducted between July 2015 and January 2016 among all girls interviewed at baseline. Round 5 was conducted between July and December 2017. Due to budgetary constraints, a sub-sample of girls interviewed at Round 3 was randomly selected for participation in Round 5.
To assess attrition bias, a probit regression where the outcome was equal to one if the respondent was excluded from the analytical sample was estimated. Covariates included the respondent’s study arm, the following socio-demographic characteristics measured at baseline: age; attended school in the current year; grade attainment; literate, defined as reading out loud a full sentence in English or a local language; mother and father alive and co-residence status; mother’s and father’s completion of primary school; household wealth quintiles, derived from a wealth index estimated using principal-components analysis; vulnerability quintiles derived from the vulnerability score estimated to target girls for invitation to AGEP; and dummy variables for program sites. To evaluate baseline balance across the intervention and control arms among girls in the analytical sample, means, 95% confidence intervals and Pearson chi-square tests for categorical variables and linear regression for continuous variables, accounting for sample clustering at the CSA level, were estimated for the set of socio-demographic characteristics listed above. Means and 95% confidence intervals, accounting for clustering at the CSA level, were also estimated for mediating indicators and outcome variables measured at baseline and at Rounds 3 and 5 to explore change across time.
The primary impact analysis was based upon an “intent-to-treat” (ITT) approach for outcomes measured at baseline and Rounds 3 and 5. The ITT parameter of impact was defined as whether a girl was invited to participate in AGEP, regardless of her actual participation in the program. Linear regressions with girl-level fixed effects were estimated. All regressions were estimated with robust standard errors adjusted for clustering at the CSA level, and all models controlled for girl’s age. Difference-in-differences (DID) were estimated, comparing the change between Round 1 and Rounds 3 and 5 for intervention versus control arms.
To account for various levels of participation in AGEP observed across girls—due to girls not signing up for the program, drop-out or absences—a secondary analysis was conducted to obtain “treatment-on-the-treated” (TOT) estimates from two-stage least squares, instrumental variable (IV) regressions. The TOT estimates relied on instrumentation to account for unobservable differences that may have contributed to differential program participation.Footnote 5 These unobservable factors may be correlated with the adolescent girl outcomes and would lead to biased estimates of impact if not statistically accounted for in the analysis. The first stage of the estimations predicted program participation, with the indicator of treatment defined as attendance to at least 52 meetings (half of the total girls group meetings). The instrumental variable used in the first stage was the randomized invitation to participate in AGEP. The validity of the instrumental variable was assessed with F-tests on excluded instruments, the Wald F-statistic and the Hansen statistic for overidentification. DID coefficients were estimated in the second stage of the regressions. All regressions were estimated with robust standard errors adjusted for clustering at the CSA level, and all models controlled for girl’s age.
After the primary (ITT) and secondary analyses (TOT) were conducted, we explored whether the program had differential impacts at Round 5 across sub-groups of girls. To this end, we re-estimated the ITT regressions including interaction terms between the treatment indicators and the following characteristics: urban vs rural, older (aged 15–19 at baseline) vs younger (aged 10–14 at baseline), highest vulnerability quintile vs lower vulnerability quintiles, and poorest household wealth quintile vs less poor household wealth quintiles.Footnote 6 For example, to test whether the intervention had higher impact for girls who were in urban sites at baseline, the ITT regressions were re-estimated including interactions of the DID term with a variable indicating whether the girl was living in an urban site at baseline or not. The coefficient of this interaction indicated whether the impact of the intervention differed by baseline location. A similar approach was used to test the effects of age, levels of vulnerability and household wealth at baseline. All regressions here and above were estimated in Stata® 15.1.