The impact of the Adolescent Girls Empowerment Program (AGEP) on short and long term social, economic, education and fertility outcomes: a cluster randomized controlled trial in Zambia

Background Adolescent girls in Zambia face risks and vulnerabilities that challenge their healthy development into young women: early marriage and childbearing, sexual and gender-based violence, unintended pregnancy and HIV. The Adolescent Girls Empowerment Program (AGEP) was designed to address these challenges by building girls’ social, health and economic assets in the short term and improving sexual behavior, early marriage, pregnancy and education in the longer term. The two-year intervention included weekly, mentor-led, girls group meetings on health, life skills and financial education. Additional intervention components included a health voucher redeemable for general wellness and reproductive health services and an adolescent-friendly savings account. Methods A cluster-randomized-controlled trial with longitudinal observations evaluated the impact of AGEP on key indicators immediately and two years after program end. Baseline data were collected from never-married adolescent girls in 120 intervention clusters (3515 girls) and 40 control clusters (1146 girls) and again two and four years later. An intent-to-treat analysis assessed the impact of AGEP on girls’ social, health and economic assets, sexual behaviors, education and fertility outcomes. A treatment-on-the-treated analysis using two-stage, instrumental variables regression was also conducted to assess program impact for those who participated. Results The intervention had modest, positive impacts on sexual and reproductive health knowledge after two and four years, financial literacy after two years, savings behavior after two and four years, self-efficacy after four years and transactional sex after two and four years. There was no effect of AGEP on the primary education or fertility outcomes, nor on norms regarding gender equity, acceptability of intimate partner violence and HIV knowledge. Conclusions Although the intervention led to sustained change in a small number of individual outcomes, overall, the intervention did not lead to girls acquiring a comprehensive set of social, health and economic assets, or change their educational and fertility outcomes. It is important to explore additional interventions that may be needed for the most vulnerable girls, particularly those that address household economic conditions. Additional attention should be given to the social and economic environment in which girls are living. Trial registration ISRCTN29322231. Trial Registration Date: March 04, 2016; retrospectively registered.


Safe spaces
Background Adolescent girls 1 in Zambia face risks and vulnerabilities that challenge their healthy development into young women including early marriage and childbearing, sexual-and gender-based violence, unwanted pregnancy and the acquisition of HIV and other sexuallytransmitted infections [1]. As indicated by the 2013-2014 Zambia Demographic and Health Survey (DHS) data, nearly one in three girls aged 20-24 had married by age 18, with a similar percentage having begun childbearing [2]. More than 35% of 15-24-year-old females have experienced physical violence, 12% have experienced sexual violence and 48% report that wife-beating is justified in certain circumstances [2]. year-olds, 4.8% of females are HIV-positive as compared to 4.1% of males. The gender disparity increases in the 20-24-year-old group as 11.2% of females are HIVpositive as compared to 7.3% of males [2]. The literature shows that adolescent girls, particularly in developing countries, lack the assets and skills needed to break out of the cycle of poverty and capitalize on opportunities that exist [3,4]. Furthermore, in these settings, women and girls' opportunities are limited by traditional practices, adverse gender norms and roles and weak institutions and laws [3,4]. For instance, Duflo argues that inequality, poverty and the lack of access to economic assets, opportunities and labor markets are primary drivers for the persistent disadvantage of women relative to men [3]. Other studies have directly linked gender power inequality to HIV risk behaviors and exposures [5]. For example, data from adolescent girls in South Africa showed that decreases in social isolation and economic vulnerability are linked to decreases in experiences of sexual coercion and transactional sex [6]. Calls have been made to address women's and girls' health issues via improvements in gender equality and empowerment [7,8].
The Adolescent Girls Empowerment Program (AGEP) in Zambia hypothesized that one way to improve girls' longer-term outcomes was through a program that aimed to empower adolescent girls by building their social, health and economic assets, allowing them to, in turn, reduce their vulnerabilities and capitalize on opportunities to improve their health, fertility and educational outcomes [1,9,10]. Studies have shown that some of these links between asset building and longer-term health and educational outcomes are present and therefore they are important to explore as pathways to health behavioral change. For example, a program in Uganda found that girls who had increased their social and health assets were able to simultaneously increase their economic assets without experiencing an increase in sexual harassment and violence as compared to girls who only built economic assets and experienced an increase in sexual harassment and violence as a result [11]. In Kenya, a qualitative study of young women 18-24 years old found that having savings was perceived as a key facilitator to translating knowledge on safer sex practices into practice [12]. This paper evaluates AGEP's theory of change by assessing the impact of the program on adolescent girls' mid-and longer-term indicators using data collected at the end of the two-year intervention, and 2 years after the end of the intervention.

Impact evaluation design
The AGEP evaluation was based on a multi-arm randomized cluster design implemented in ten sites, half urban and half rural, in four provinces in Zambia. Study provinces and the number of sites per province were selected purposefully, based on the feasibility of operating the program in the context of conducting an embedded randomized impact evaluation, as well as through discussions with the donor regarding the diversity of the target populations. Sites within provinces and urban/ rural stratification, were randomly selected from a larger number of possible locations. Sites in rural areas contained multiple contiguous or proximal villages or chiefdoms, while in urban sites the program was implemented in one or more high-density housing compounds. The study adhered to CONSORT guidelines. 1 Adolescence in this paper is defined using the World Health Organization definition of 10-19 years old -http://www.who.int/ topics/adolescent_health/en/ Interventions and study arms AGEP included three intervention components: The first was weekly girls group meetings that were facilitated by a female mentor from the community. Mentors were aged 20-35, selected via an interview process and participated in an initial 10-day training on the content and mentor skills, a five-day refresher training midway through the intervention and monthly supervision meetings with program staff. Groups were segmented by age and marital status, and included curricula-guided sessions on sexual and reproductive health, HIV, life skills and financial education. The weekly girls groups, known as 'safe spaces,' were considered the core component of AGEP as there was a growing literature that girls meeting regularly for sessions facilitated by a female mentor, combining sexual and reproductive health, life skills and economic-strengthening components-had been successful in several countries in sub-Saharan Africa for improving longer-term health and economic outcomes [11,13,14]. A second component of the intervention was the provision of a health voucher to girls, which could be used at contracted public and private facilities for free general wellness and sexual and reproductive health services. A third component of AGEP was the provision of an adolescent-friendly savings account, with features such as low fees, low opening balance (~$0.25) and the ability for a minor to transact on the account. The design of the bank accounts was developed in partnership with the National Savings and Credit Bank of Zambia. Further details of AGEP's interventions are described elsewhere [1]. The study included three intervention arms to assess the effect of the weekly meetings alone, as well as the added effect of the "add-on components": Arm 1 included weekly meetings only, Arm 2 included weekly meetings and the health voucher, and Arm 3 included weekly meetings, the health voucher and the savings account. A fourth arm was a control in which there were no interventions offered.

Sample size and power analysis
A minimum detectable effect (MDE) approach was used to conduct power analysis as limited baseline evidence was available to support a sample size estimation approach. MDEs for the study's longer-term outcomes comparing each arm against the control were estimated using Optimal Design Plus Software Version 3.0 for a multi-site cluster randomized trial [15]. The MDE estimation was stratified by urban/rural location and by younger and older cohorts (10-14 and 15-19). Endline (four years after the baseline) estimates of study indicators for the control arm were obtained from the 2007 Zambia DHS [16]. Power was set at 0.80, significance level at 0.05, and the effect size variability was fixed at 0.00, as analyses controlled for site fixed effects. Given the budget available for fieldwork and a total of ten study sites, MDEs for different combinations of numbers of clusters and respondents per cluster were compared and it was determined that the most efficient sample size was 40 clusters per study arm (four clusters per arm per site) and 20 participants per cluster, 10 per age cohort, for a total sample at endline of 3200. A cluster was defined as a Census Supervisory Area (CSA), as delineated by the Zambia Central Statistical Office. 2 The MDEs are reported elsewhere [1].

Randomization and study participants
Random selection of clusters and assignment to study arms was conducted at the site level via a public lottery attended by key community stakeholders. At the lottery, one bowl contained slips of papers that had one CSA code per slip and another bowl had sixteen slips of paper, each with one arm listed (four per arm). Representatives from the community came to the front, selected one CSA slip and then one study arm slip. The process was repeated 16 times, completing CSA selection and study arm assignment. Following the public lottery, a household listing was conducted in 2013 in all selected clusters. In service of the AGEP goal of reaching the most vulnerable girls, while also creating conditions for conducting a rigorous cluster randomized evaluation, eligible girls were identified for recruitment into the AGEP intervention through the household listing. A vulnerability score was assigned using an ordinary least squares (OLS) regression to estimate the number of grades behind for age as the dependent variable; independent variables in the regression model included age, not in school, ever married and having at least one child. The estimated residual of the regression was then used to represent vulnerability, with higher residuals indicating higher vulnerability. Both married and unmarried girls were ranked from highest to lowest vulnerability in each site and the 2000 most vulnerable in each site were tagged. Of tagged girls, those living in AGEP intervention clusters were invited to participate in the program, while girls living in control clusters were not issued an invitation.
The lists of most vulnerable girls in each site, excluding ever-married girls, constituted the sampling frame for the research. The baseline target sample was inflated to account for conservative estimates of attrition and refusals, in particular among the older cohort. Up to 17 girls aged 10-14 and up to 23 girls aged 15-19 within each cluster, stratified by two age groups (10-14 and 2 A CSA contains a collection of adjacent standard enumeration areas (SEA) that range in number from two to eight per CSA. SEAs are geographical areas that contain approximately 100 households in rural areas and 150 households in urban areas.
15-19 years old), were randomly selected for participation in the research. 3 The AGEP research protocol was reviewed and approved by the Population Council Institutional Review Board (PC-IRB), the University of Zambia's Research Ethics Committee (UNZA-REC) and the Zambian Ministry of Health. Informed consent was obtained from all participants. Parental or guardian consent was also obtained for participants under 18 years old. 4

Study instruments
The adolescent survey instruments were designed to measure both girls' assetsincluding gender norms, self-efficacy, HIV and sexual and reproductive health (SRH) knowledge, financial literacy and savings behavior as well as longer-term outcomesincluding schooling attainment, age of sexual debut, first birth and marriage. Attendance data for each girl, including date of attendance and session topic, were collected on mobile phones by group mentors via Open Data Kit (ODK). Attendance data collected from ODK were then uploaded into a cloud-based database hosted on a cloud-based platform.

Outcomes
This paper focuses on longer-term educational and fertility outcomes as well as a subset of hypothesized mediating indicators representing girls' assets within each domain (social, health and economic). Additional file 1 provides detailed definitions for all outcomes and mediating indicators. Indicators were selected based on their theorized influence on adolescent-girl outcomes, while also being directly linked to the intervention activities and curriculum topics. The following eight mediating indicators reflecting girls' social, health and economic assets were evaluated among all girls: (1) Social Assetsa) self-efficacy, measured as a score with 0-10 range based on the Generalized Self-Efficacy Scale [17]; b) whether the girl had a safe space in the community to meet with friends, c) positive gender attitudes, with questions based on measures developed in previous studies [11,18]. This indicator ranged from 0 to 7; and d) non-acceptability of intimate partner violence (IPV), based on Demographic and Health Survey (DHS) questions, measured as a binary variable that is equal to 1 if the respondent did not agree that a husband is justified in hitting his wife for any of five specific reasons; (2) Economic Assetsa) financial literacy, measured based on previous studies [11,18,19] as a score ranging from 0 to 9; b) a dichotomous variable of whether the girl saved any money in the past year; and (3) Health Assetsa) knowledge of fertile period and contraceptive methods based on DHS questions, measured as a score with 0-11 range, and b) HIV/AIDS knowledge based on DHSrelated questions, measured as a score with 0-11 range. The following two mediating indicators reflecting sexual behavior were evaluated among girls aged 15 years and older who had initiated sex: (1) used condom at first sex; and (2) reported having had transactional sex. The mediating indicators, while reflecting the original intent of the proposed mediating outcomes in the initial study protocol submitted for ethical review, were revised after baseline to capture the most valid and reliable measures still in line with the theoretical framework (published elsewhere [1]).
Two educational outcomes were evaluated among all girls: (1) completed the last grade of primary school (grade 7); and (2) completed grade 9. The following four fertility longer-term outcomes were evaluated among girls aged 15 years and older: (1) ever had sex; (2) ever been pregnant; (3) ever given birth; and (4) ever been married. All of these were identified as primary longterm outcomes in the initial study protocol submitted for ethical review.

Analytical sample
This paper uses the sample of girls interviewed at baseline, Round 3 (end of the intervention) and Round 5 (2 years after the end of the intervention). Baseline data were collected between July 2013 and February 2014. Round 3 was conducted between July 2015 and January 2016 among all girls interviewed at baseline. Round 5 was conducted between July and December 2017. Due to budgetary constraints, a sub-sample of girls interviewed at Round 3 was randomly selected for participation in Round 5.

Statistical analysis
To assess attrition bias, a probit regression where the outcome was equal to one if the respondent was excluded from the analytical sample was estimated. Covariates included the respondent's study arm, the following socio-demographic characteristics measured at baseline: age; attended school in the current year; grade attainment; literate, defined as reading out loud a full sentence in English or a local language; mother and father alive and co-residence status; mother's and father's completion of primary school; household wealth quintiles, derived from a wealth index estimated using principal-components analysis; vulnerability quintiles derived from the vulnerability score estimated to target girls for invitation to AGEP; and dummy variables for program sites. To evaluate baseline balance across the intervention and control arms among girls in the analytical sample, means, 95% confidence intervals and Pearson chi-square tests for categorical variables and linear regression for continuous variables, accounting for sample clustering at the CSA level, were estimated for the set of socio-demographic characteristics listed above. Means and 95% confidence intervals, accounting for clustering at the CSA level, were also estimated for mediating indicators and outcome variables measured at baseline and at Rounds 3 and 5 to explore change across time.
The primary impact analysis was based upon an "intent-to-treat" (ITT) approach for outcomes measured at baseline and Rounds 3 and 5. The ITT parameter of impact was defined as whether a girl was invited to participate in AGEP, regardless of her actual participation in the program. Linear regressions with girl-level fixed effects were estimated. All regressions were estimated with robust standard errors adjusted for clustering at the CSA level, and all models controlled for girl's age. Difference-in-differences (DID) were estimated, comparing the change between Round 1 and Rounds 3 and 5 for intervention versus control arms.
To account for various levels of participation in AGEP observed across girls-due to girls not signing up for the program, drop-out or absences-a secondary analysis was conducted to obtain "treatment-on-the-treated" (TOT) estimates from two-stage least squares, instrumental variable (IV) regressions. The TOT estimates relied on instrumentation to account for unobservable differences that may have contributed to differential program participation. 5 These unobservable factors may be correlated with the adolescent girl outcomes and would lead to biased estimates of impact if not statistically accounted for in the analysis. The first stage of the estimations predicted program participation, with the indicator of treatment defined as attendance to at least 52 meetings (half of the total girls group meetings). The instrumental variable used in the first stage was the randomized invitation to participate in AGEP. The validity of the instrumental variable was assessed with F-tests on excluded instruments, the Wald F-statistic and the Hansen statistic for overidentification. DID coefficients were estimated in the second stage of the regressions. All regressions were estimated with robust standard errors adjusted for clustering at the CSA level, and all models controlled for girl's age.
After the primary (ITT) and secondary analyses (TOT) were conducted, we explored whether the program had differential impacts at Round 5 across sub-groups of girls. To this end, we re-estimated the ITT regressions including interaction terms between the treatment indicators and the following characteristics: urban vs rural, older (aged 15-19 at baseline) vs younger (aged 10-14 at baseline), highest vulnerability quintile vs lower vulnerability quintiles, and poorest household wealth quintile vs less poor household wealth quintiles. 6 For example, to test whether the intervention had higher impact for girls who were in urban sites at baseline, the ITT regressions were re-estimated including interactions of the DID term with a variable indicating whether the girl was living in an urban site at baseline or not. The coefficient of this interaction indicated whether the impact of the intervention differed by baseline location. A similar approach was used to test the effects of age, levels of vulnerability and household wealth at baseline. All regressions here and above were estimated in Stata® 15.1. Figure 1 shows the analytical sample flow by study arms. At baseline, 4661 never-married girls ages 10-19 were successfully interviewed, representing 88% of eligible girls drawn in the sampling frame. 7 In Round 3, 88% of girls in the baseline sample were interviewed. In Round 5, 82% of girls in the Round 5 sampling frame (or 66% of the baseline sample) were interviewed. No differential attrition between intervention and control arms was observed. Girls interviewed at Round 5 constituted the analytical sample used in this paper, as all girls interviewed at Round 5 had been interviewed both at baseline and at Round 3. Results from the probit regression estimated to compare girls who were lost to follow-up to girls included in the analytical sample are shown in Additional file 2. Study arms were not associated with being excluded from the analytical sample. Girls excluded from the analytical sample were, at baseline, statistically 5 Analysis of program uptake and participation showed that 26% of girls invited never attended, 44% of girls attended 52 sessions or less (representing half of all potential sessions one could attend) and 30% attended more than 52 sessions. The mean number of sessions attended (32.8) was balanced across program arms, therefore program arm assignment or the added components of the intervention did not affect program participation. There was however differential program uptake with girls who were younger, living in rural areas, in school, with higher numeracy skills, and the biological daughter of the head of household being the most likely to participate in a greater number of sessions. 6 Secondary analysis of results by urban v. rural and older v. younger stratification was included in the original protocol and sample size calculations took these subgroups into account. Additional subgroup analysis on SES and vulnerability score were added at the time of analysis to determine if the program effects on particularly vulnerable subgroups were being washed out (or were washing out) effects on relatively less vulnerable sub-groups. 7 14% of girls in the research sampling frame were found not to be eligible as they had relocated, married, or were outside of the age range.

Results
significantly older and more likely to not co-reside with their mothers, they were also less likely to be in school and had lower grade attainment at baseline than girls included in the analytical sample. Table 1 shows descriptive statistics among girls in the analytical sample for selected baseline sociodemographic characteristics by intervention and control arms. As the analytical sample excluded girls who were part of the baseline sample but were lost to follow-up, tests of differences across arms were performed to assess whether attrition led to baseline unbalances across arms. As seen in the table, none of the F statistics were significant at the 0.05 level, thus while attrition was not random (as discussed in the previous paragraph), it did not affect the baseline balance. Table 2 provides the baseline and Rounds 3 and 5 means, by study arms, among the analytical sample for the primary indicators of interest across the three domains of empowerment addressed in AGEP, social assets, economic assets and health assets, as well as indicators of sexual behavior among respondents ages 15 and older who had ever had sex. In the domain of social assets, at baseline, respondents showed middling levels of self-efficacy, with an average score of 6 to 6.1, within a potential range of 0 to 10. Self-efficacy levels increased over time as respondents in the intervention arms scored 7.4 to 7.7 out of 10 and respondents in the control arm scored 7.3 at Round 5. A high percentage of girls (58 to 62%) did not have access to a safe place in the community to meet with friends at baseline, which contributed towards their social isolation. The percentage of respondents who reported no access to a safe space decreased over time from 45 to 48% of respondents in the intervention arms and 48% of respondents in the control arm at Round 5. While girls on average held positive gender norms (scoring 4.9 to 5.1 out of 7 at baseline), generally there was also a high acceptance of intimate partner violence (IPV) as 60 to 63% agreed at baseline that hitting/ striking a spouse was acceptable in certain circumstances. No change was observed over time in measured respondents' gender norms, but acceptance of IPV appeared to increase as girls aged. For measured economic assets, at baseline, respondents had an average score of 5.1 to 5.3 out of 9 in financial literacy, and only 14 to 16% had saved money in the past year. Financial literacy increased over time as respondents in the intervention arms scored 6.2 to 6.4 out of 9 and respondents in the control arm scored 6.2 at Round 5, and the percentage In the domain of health assets, while knowledge of the fertile period and contraceptive methods was low at baseline (1.6 to 1.7 out of 11), levels of HIV knowledge were higher (5.7 to 6.1 out of 11). On average, scores for both indicators increased over time, but knowledge of the fertile period and contraceptive methods remained low (3.6 to 3.8 out of 11 among respondents in the intervention arms and 3.5 among respondents in the control arm at Round 5), while HIV knowledge score reached 8.9 to 9.1 out of 11 among respondents in the intervention arms and 8.9 among respondents in the control arm at Round 5. Among girls 15 years and older who had reported having had sex at baseline, 34 to 46% reported having used a condom at first sex, and 48 to 59% agreed to having had transactional sex. Over time, as a larger group of girls in the analytical sample became ages 15 and older and experienced sex, the percentage of girls who had used a condom at first sex was only slightly higher (41 to 43% at Round 5) than at baseline, but the percentage of girls who agreed to having had transactional sex became considerably lower (36 to 40% at Round 5). Table 2 also provides the prevalence of the longerterm educational and fertility outcomes of interest at   Table 3. The three intervention arms are combined and compared against the control arm, as estimates by arms showed few substantial differences across intervention arms on outcomes (results shown in Additional file 3). 8 The left-hand panel of the table shows the coefficients representing the estimated DID between baseline and R3, and the right-hand panel shows the coefficients representing the estimated DID between baseline and R5. The table also shows 95% confidence intervals (CIs) for the estimated DID. Of the eight asset empowerment indicators, the program improved outcomes relative to the control at Round 3 across four indicators: 1) having a safe space in the community to meet with friends increased 8.2 percentage points (95% CI 1.7-14.6), 2) financial literacy score saw a modest increase of 0.24 points out of 9 (95% CI 0.01-0.48), whether girls had saved money in the past year increased 6.7 percentage points (95% CI 1.8-11.5) and sexual and reproductive health knowledge also saw a modest increase of 0.29 points out of 11 (95% CI 0.10-0.48). At Round 5, improved outcomes relative to the control are still observed in saving money (DID coef 6.7, 95% CI 2-11.3) and sexual and reproductive health knowledge (DID coef 0.27, 95% CI 0.06-0.48), and positive but modest program impacts emerged in self-efficacy with a 0.31 higher score out of 10 (95% CI 0.02-0.6). Program impacts were not observed for girls' HIV knowledge and gender norms. For the sexual behavior indicators, the ITT estimates indicated that that while there was no impact on condom use at first sex, the program had a positive impact in decreasing the prevalence of transactional sex relative to the control by 11.9 percentage points at Round 3 (95% CI -23.4--0.5) and by 11.8 percentage points at Round 5 (95% CI -21.9--1.7), suggesting meaningfully lower exposure to adverse sexual reproductive health risks.
No positive program impacts were observed, neither at the end of the intervention (Round 3), nor two years Table 3 Estimated difference-in-differences (DID) for intent-to-treat (ITT), results from linear regressions with girl-level fixed effects Round  All models adjust for age. Robust standard errors adjusted for clusters at the CSA level *** p < 0.001, ** p < 0.01, * p < 0.05, † p < 0.1 a Estimated as simple differences at each round between intervention and control arms excluding girls who had ever had sex at baseline and adjusting for age and study site after the end of the intervention (Round 5), for the primary education and fertility outcomes. The TOT estimates are provided in Table 4. Estimates by intervention arms are presented in Additional file 4. The left-hand panel of the table shows the coefficients representing the estimated DID between baseline and R3, and the right-hand panel shows the coefficients representing the estimated DID between baseline and R5. The table also shows 95% confidence intervals for the estimated DID. The estimates from the two-stage, instrumental-variables regressions indicate what the "undiluted" effect of the program impact is assuming that only girls who participated in at least 52 sessions were affected by the program. Across the indicators, the TOT estimates were approximately three times as large as the ITT estimates.
To assess the potential heterogeneity of program impact, additional estimation results are presented in Table 5. The table provides ITT assessments of impact on sub-groups that were hypothesized to potentially benefit differentially from the program: urban vs rural sites; older (aged 15-19 at baseline) vs younger (aged 10-14 at baseline); poorest household wealth quintile vs less-poor household wealth quintiles; and, highestvulnerability quintile vs lower-vulnerability quintiles. The results in Table 5 indicated that there were few meaningful differences among the mediating indicators of interest for sub-groups: relative to the younger cohort of girls, the impact on self-efficacy was 0.49 points out of 10 higher for older girls (95% CI − 0.02-1) and nonacceptability of IPV was 11.7 percentage points higher for older girls (95% CI -0.5-23.9). The results for the longer-term fertility outcomes showed, however, girls in the intervention arms in urban sites were 13.5 percentage points (95% CI 3.3-23.7) more likely to have initiated sex than girls in the rural sites. Further, the most vulnerable girls in the intervention arms were 16.2 percentage points more likely to be married (95% CI 0.8-31.7), 28 percentage points more likely to become pregnant (95% CI 14.1-41.8) and 21.3 percentage points Table 4 Estimated difference-in-differences (DID) for treatment-on-the-treated (TOT), results from two-stage least squares IV regressions with girl-level fixed-effects Round  All models adjust for age. Robust standard errors adjusted for clusters at the CSA level *** p < 0.001, ** p < 0.01, * p < 0.05, † p < 0.1 a Estimated as simple differences at each round between intervention and control arms excluding girls who had ever had sex at baseline and adjusting for age and study site more likely to give birth (95% CI 7.1-35.4). These results suggest that the lack of impact (statistical and substantive) in the ITT estimates in Table 3 is not likely masking underlying significant impacts for certain subgroups, while the apparent negative impact on sex initiation in Table 3 seems driven by girls in the urban sites.

Discussion
The results presented in this paper assessing the impact of a girls' empowerment program add to the literature evaluating interventions that focus on building girls' assets as means to overcome their vulnerability. AGEP was designed to improve shorter-term outcomes as captured by social, health and economic assets, with the goal of improving longer-term educational and fertility outcomes such as delayed sexual debut, marriage and pregnancy. Recent systematic reviews of the published and grey literature on interventions to improve outcomes of unintended pregnancy, STIs and child marriage among adolescents have shown that a diversity of interventions have effects, yet there is no one intervention that works in all contexts and for all outcomes [20][21][22]. For example, a combined economic strengthening and sexual and reproductive health education intervention for adolescent girls increased income generation, reduced pregnancy and delayed marriage in Uganda [14], yet when replicated by the same organizations in Tanzania, the interventions had no effect [23]. In the case of AGEP in All models adjust for age. Robust standard errors adjusted for clusters at the CSA level *** p < 0.001, ** p < 0.01, * p < 0.05, † p < 0.1 a Estimated as simple differences at Round 5 between intervention and control arms excluding girls who had ever had sex at baseline and adjusting for age and study site Zambia, the hypothesis was not confirmed, and although there was sustained change on a limited number of assets, namely SRH knowledge, self-efficacy and savings, the intervention did not lead to a combined set of social, health and economic assets. In addition, the shorterterm changes that did occur did not result in longerterm impacts on education or fertility for vulnerable girls in the Zambian context. Several factors must be considered in understanding why the intervention did not result in the hypothesized changes. One factor is the low participation, as only 30% of the sample participated in half or more of the program sessions, and 25% did not participate at all. While the TOT analysis did not show meaningful change for those who did participate, understanding who attended AGEP sessions is one component of understanding the effects of the program, as well as potential gaps in content. Given that AGEP purposefully targeted the most vulnerable girls in each community, it is likely that they face social and economic barriers at the individual, household and community levels that prevent both participation and effectiveness of the program components. It is possible that a program focused on girls' asset building alone, even if it combined health and economic components, was not meaningful enough for girls and their families to lead to participation. Potentially, girls from the most disadvantaged households need additional interventions that address the household economic status, which may prevent girls from participating in the first place. It is also possible that without addressing the economic constraints at the household level, even participation in a girls' empowerment program alone is not enough to impact longer-term outcomes such as educational attainment or timing of pregnancy. More research needs to be done to understand what will overcome the barriers to participation for the most disadvantaged, even if the findings suggest that more complex, and likely costly, interventions are necessary. One particularly promising intervention approach might be the combination of targeted cash transfers and direct girls' empowerment programming [24], as it simultaneously can address household poverty, as well as build individual assets for girls.
A second factor is the social and cultural context in which this intervention took place. With 60% of girls agreeing that intimate partner violence is acceptable, two-thirds having experienced physical violence and half having experienced sexual violence (defined as being forced into sexual intercourse or other sexual acts), it is likely that an intervention targeting only asset building for girls is not sufficient to change attitudes around gender roles or acceptability of violence. Taking a socioecological approach [25] in the future may lead to different results. Future programs in this context need to address norms at the household and community level, in addition to direct interventions with girls. This is especially relevant in thinking about the beliefs held by the mentorsthe women facilitating the weekly group sessionsand how the social norms around gender to which they ascribe mediate the translation of program content to the participants in their groups. This is reinforced by looking at the type of outcomes for which there was positive changeself-efficacy, sexual and reproductive health knowledge, savings behavior and transactional sexthese are potentially outcomes over which an adolescent may exercise more control. Other outcomesgender norms, recent condom use, school completion, timing of marriageinvolve external factors (e.g. parents and sexual partners), and therefore may be harder to change with an intervention that focuses directly on girls.
There are several limitations to the study and interpretation of its results. The first is this is one study done in one cultural context, and the ability to generalize to other settings may be limited. Of course, this is a limitation of all studies. The second is the greatly reduced sample for Round 5, which likely reduced precision and introduced selection bias into the analytical sample visà-vis age, mother's co-residence and educational attainment. This could impact the validity of our findings. Third, the trial was not registered prospectively, however the design was not changed according to the protocol submitted for ethical approval. Finally, we did not correct for the testing of multiple hypotheses. However, given the lack of positive results, we do not think this would influence our findings.

Conclusions
Overall, the results of the AGEP trial provide an important contribution to the field of adolescent programming and understanding what combinations of interventions work and do not work, in different contexts. These findings suggest that when addressing long-term educational and fertility outcomes among very vulnerable adolescent girls, interventions that only target the girl herself may not be sufficient.
Additional file 1. Mediating indicators and longer-term outcomes.
Additional file 2. Results from a probit regression for attrition.
Additional file 3. Estimated difference-in-differences (DID) for intent-totreat (ITT) by intervention arm, results from linear regressions with girllevel fixed effects.
Additional file 4. Estimated difference-in-differences (DID) for treatment-on-the-treated (TOT) by intervention arms, results from twostage least squares IV regressions with girl-level fixed-effects.