Effectiveness of active school transport interventions: a systematic review and update

Background Active school transport (AST) is a promising strategy to increase children’s physical activity. A systematic review published in 2011 found large heterogeneity in the effectiveness of interventions in increasing AST and highlighted several limitations of previous research. We provide a comprehensive update of that review. Methods Replicating the search of the previous review, we screened the PubMed, Web of Science, Cochrane, Sport Discus and National Transportation Library databases for articles published between February 1, 2010 and October 15, 2016. To be eligible, studies had to focus on school-aged children and adolescents, include an intervention related to school travel, and report a measure of travel behaviors. We assessed quality of individual studies with the Effective Public Health Practice Project quality assessment tool, and overall quality of evidence with the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) approach. We calculated Cohen’s d as a measure of effect size. Results Out of 6318 potentially relevant articles, 27 articles reporting 30 interventions met our inclusion criteria. Thirteen interventions resulted in an increase in AST, 8 found no changes, 4 reported inconsistent results, and 5 did not report inferential statistics. Cohen’s d ranged from −0.61 to 0.75, with most studies reporting “trivial-to-small” positive effect sizes. Three studies reported greater increases in AST over longer follow-up periods and two Safe Routes to School studies noted that multi-level interventions were more effective. Study quality was rated as weak for 27/30 interventions (due notably to lack of blinding of outcome assessors, unknown psychometric properties of measurement tools, and limited control for confounders), and overall quality of evidence was rated as low. Evaluations of implementation suggested that interventions were limited by insufficient follow-up duration, incomplete implementation of planned interventions, and limited access to resources for low-income communities. Conclusions Interventions may increase AST among children; however, there was substantial heterogeneity across studies and quality of evidence remains low. Future studies should include longer follow-ups, use standardized outcome measures (to allow for meta-analyses), and examine potential moderators and mediators of travel behavior change to help refine current interventions. Trial registration Registered in PROSPERO: CRD42016033252 Electronic supplementary material The online version of this article (10.1186/s12889-017-5005-1) contains supplementary material, which is available to authorized users.


Background
Consistent evidence shows that children and adolescents who engage in active school transport (AST) are more physically active than those who travel by motorized vehicles [1,2]. Cycling to and from school can also increase cardiovascular fitness [1] and is associated with a better cardiometabolic health profile [3]. At the population level, replacing motorized travel by AST could reduce exhaust and greenhouse gas emissions [4,5]. Additional benefits of AST include positive emotions during the school trip [6], better way-finding skills [7] and superior school grades [8].
Despite these benefits, the prevalence of AST has decreased markedly during the last few decades in many countries [9][10][11][12][13]. To address this issue, many interventions have been implemented. Perhaps the most wellknown is the Safe Routes to School (SRTS) program which has received over one billion dollars in funding from the US government [14]. Recent analyses concluded that New York City's SRTS program led to a 33-44% reduction in injuries among school-aged children and the program was cost-effective even when disregarding any potential benefits related to increased physical activity and decreased congestion and pollution [15,16]. In other jurisdictions, school travel plans (STP) have been implemented to address key barriers to AST at the local level, but often with limited funding [17][18][19][20][21]. Moreover, walking school buses (WSB) wherein children walk together on a set route with adult supervision have been implemented in many jurisdictions to address parental safety concerns [22,23].
To our knowledge, Chillón and colleagues [24] published the first systematic review of the effectiveness of AST intervention among children and adolescents in 2011. While the included interventions were quite heterogeneous, most observed small increases in AST. However, quality of evidence for all interventions were rated as "weak" based on the Effective Public Health Practice Project (EPHPP) quality assessment tool for quantitative studies [25]. Moreover, none of the interventions examined the moderators and mediators of travel behavior change. A better understanding of moderators and mediators would enable researchers to understand what works for whom and why. We provide a comprehensive update on the effectiveness of AST interventions in children and youth that have been published over the last 6 years. We also aimed to review the literature on the moderators and mediators of AST interventions.

Search strategy
As our goal was to update the previous review [24], we replicated their search strategy. Databases searched included PubMed, Web of Science (SCI and SSCI), SPORTDiscus, the Cochrane library, and the National Transportation Library. The search terms addressed four main categories: school-age children (adolescen* OR child OR children OR youth OR student* OR pupil OR pupils) AND active transportation (bike OR bikers OR biking OR bicycl* OR cycle OR cycling OR cyclist* OR commute* OR commuting OR transportation OR travel*) AND intervention (intervention* OR implement* OR evaluat* OR change OR pilot OR project OR environment* OR engineer* OR encourage* OR planning OR impact OR "walk to school" OR "safe routes to school" OR "walking schoolbus" OR "walking school bus" OR "walking school buses") AND school. Articles published between February 1, 2010 (the cut-off date of the previous review) and October 15, 2016 were considered eligible. Our review is registered in PROSPERO (CRD42016033252; see http://www.crd.york.ac.uk/PROSPERO/display_record.asp? ID=CRD42016033252).

Inclusion and exclusion criteria
To be included in the review, studies had to: 1) have been conducted among children and adolescents (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18) year olds); 2) focus on AST; 3) include an intervention; and 4) examine the effect of the intervention on a measure of active transportation or physical activity. Studies that did not meet all of these criteria were excluded. Language was not an exclusion criterion. Titles and abstract of all potentially relevant articles were screened by GM and GF. Full text copies of all articles that were not excluded at this stage of the review were then screened by GM and GF. Any discrepancy was resolved by consensus.

Data extraction
The following data were extracted from each included study: lead author, country, a brief description of the intervention and its methodology, the effects on AST, the moderators and mediators examined, the effects on other outcomes and the types of strategies that were used based on the Safe Routes to School 6E model [14]. The 6 E's are: 1) education (teaching students and community members about the different transportation options and ensuring they have the skills and know-how to be safe in traffic); 2) encouragement (using events, activities and incentives to promote AST); 3) engineering (making improvements to the built environment to increase safety); 4) enforcement (partnering with law enforcement to address traffic and crime concerns in the neighborhoods around schools and along school routes); 5) evaluation (assessing the effectiveness of the interventions); and 6) equity (ensuring that initiatives are benefiting all demographic groups). By definition, all studies that met our inclusion criteria have used evaluations, so this strategy was not extracted. Data extraction was done by RL and GM for a subsample of studies, and only by RL for the remainder. When relevant information was missing from included papers, we attempted to contact the lead author and/or the senior author.

Quality assessment
To assess the methodological quality of each study, we used an adapted version of the EPHPP. This tool includes 6 components: 1) selection bias; 2) study design; 3) control for confounders; 4) blinding of participants and study staff; 5) validity and reliability of the data collection tools; and 6) withdrawals and drop-outs. Each component was rated as "weak", "moderate" or "strong" based on standardized criteria, and then the number of weak ratings was tallied. Following the EPHPP approach, studies with zero weak ratings were rated as strong, studies with one weak rating were rated as moderate, and studies with at least two weak ratings were rated as weak. We retained the modifications proposed by Chillón and colleagues [24] to make the tool more suitable to studies in which the school is the unit of allocation. We also added a number of precisions to clarify the interpretation of the items. Our adapted EPHPP tool is available in Additional file 1. Quality assessment was first performed by RL and DR for a subsample of five studies. After consensus was attained for these studies, the remaining articles were assessed either by RL or DR. In case of doubt, the reviewer was asked to indicate the issue in an Excel spreadsheet and all issues were resolved by consensus among the two reviewers. Because blinding of participants was considered unfeasible in the context of most AST interventions, we present results both with and without the blinding component of the EPHPP. In addition, we assessed overall quality of evidence using the "Grades of Recommendation, Assessment, Development, and Evaluation" (GRADE) approach [26,27]. Following this approach, randomized controlled trials begin as high quality evidence, but they may be downgraded based on limitations in the design and implementation, indirectness of evidence, unexplained heterogeneity of results, imprecision of estimates, and high probability of publication bias. Observational studies begin as low quality evidence, but may be upgraded if there are large effect sizes, a dose-response gradient, or if all plausible confounding would reduce the treatment effect [26,27]. The overall quality of evidence was rated by consensus among the authors.

Statistical analyses
Following the procedures of Chillón and colleagues [24], we computed Cohen's d as a measure of effect size for each intervention. For interventions that included a control group, effect size was computed as the standardized mean difference of the changes in AST between the experimental and control groups. For those that included only an experimental group, it was computed between baseline and follow-up data. Additional file 2 provides comprehensive details on how effect sizes were computed for each intervention. Authors were contacted to obtain information required to calculate d. Following Cohen's [28] guidelines, effect size was categorized as trivial (d < 0.2), small (d = 0.2), medium (d = 0.5), or large (d = 0.8). Due to the large methodological heterogeneity of the included studies (see Table 1), meta-analysis was considered inappropriate.

Results
The flow of papers in the review process is depicted in Fig. 1. Overall, 6318 papers were identified by the search including 2339 in PubMed, 1555 in Web of Science, 377 in Cochrane, 882 in SPORTDiscus, and 1165 in the National Transportation Library. One paper was identified from the authors' personal libraries. All abstracts were screened, and 54 papers were found to be potentially eligible for inclusion. After a thorough selection process, 27 papers were excluded due to the following exclusion criteria: no/ineligible intervention, n = 17; no measure of physical activity or AST, n = 8; review article, n = 2. A total of 27 papers, reporting on 30 different interventions, were included for analyses [17][18][19][20].
Results are presented at the intervention level because three papers reported the findings of two different interventions. Specifically, Buckley et al. [29] included a fall event without control group and a spring event with control group, Crawford and Garrard [34] included a pilot study with control schools (pilot schools) and a main study without control group (program schools), and Johnson et al. [41] reported case-control analyses using data from two different surveys conducted in distinct populations (Bikeability and CensusAtSchool). Eleven interventions were conducted in the US, five in the UK, three in Canada, two in Australia, Belgium, Denmark and New Zealand, and one in Spain and China. Another intervention was conducted simultaneously in Canada and the UK.

Characteristics of interventions
Of these interventions, six evaluated Safe Routes to School (SRTS) interventions [38,39,42,43,46,48], seven evaluated school travel plan (STP) projects [17-20, 30, 34], and two examined stand-alone walking school buses (WSB) schemes [45,47]. Four interventions focused on the effects of bicycle training programs [35,36,41], five examined the effects of stand-alone events or contests [29,31,33,40], and two were multi-component interventions that examined, among other things, changes in AST following the intervention [32,51]. Others included two studies examining the effect of curriculum-based programs on AST [44,50], one intervention using a drop-off spot from which driven children could walk to school with adult supervision [49], and an investigation of the effect of (2 control and 2 intervention) | participants were ages 10-13, but younger children were also counted in the observations. Duration: 9-12 months.
AST measure: observation counts + student hands-up survey (trips to school only, all active modes combined).
Observation results show that AST increased significantly in the inner suburban pilot school and the outer suburban control school. Hands-up surveys show that AST increased in the inner suburban pilot school and no changes were found in the other schools.
Qualitative data suggest that the program was easier to implement within a school that was smaller, more established, with a culture that was accepting and enthusiastic about AST, in an area of higher density and lower car use, with more infrastructure improvements and a more "hands-on" approach from the Coordinator.
Results differed by level of urbanization (see "effects on AST" column). None examined.

Crawford & Garrard
Larger increase in AST with longer follow-up period. The program was more effective in older students, in smaller schools and in the city of Auckland, but it was less effective in low SES schools (all p < 0.05). None examined.

Quality assessment
Quality ratings are shown in Table 2. For individual components of the EPHPP, the proportion of weak ratings was 3.3% for study design, 30.0% for withdrawals and dropouts, 56.7% for selection bias, 60.0% for control for confounders, 66.7% for data collection methods, and 100% for blinding. Following Chillón and colleagues' [24] modifications of the EPHPP, four studies were rated "non-applicable" for withdrawals and dropouts because participants were recruited after the intervention occurred and could not have dropped out. No study reported that outcome assessors or participants were blinded, and only two studies discussed blinding and specified that it was not feasible in their intervention [35,45]. In analyses that included the blinding component of the EPHPP tool, only three studies were rated as "moderate" [32,39,45], and the remainder were rated as "weak". In a sensitivity analysis that excluded the blinding component, study quality was rated as weak for 21 interventions [19, 20, 29-31, 33, 34, 36, 38, 40-43, 46, 48-51], moderate for six interventions [17,18,35,37,44,47], and strong for three interventions [32,39,45]. While our review included some randomized controlled trials, most individual studies were rated as "weak" and very serious limitations in the design and implementation of interventions were noted, as mentioned above. Therefore, we attributed a low grade for the overall quality of evidence.

Intervention effectiveness
Overall, 13 interventions resulted in a statistically significant increase in AST [18,29,31,38,41,42,45,[47][48][49][50][51] while eight reported no changes in AST [20,32,33,35,37,43,46,47]. Of the latter studies, McMinn et al. [43] reported a smaller seasonal decline in PA among children in their intervention group, and this can be viewed as a positive finding given that PA typically declines during the fall and winter. Five interventions did not include an hypothesis test for changes in AST [17,19,29,30,40]. The remaining studies reported inconsistent or conflicting results. Specifically, in their pilot study, Crawford & Garrard [34] reported a significant increase in AST in their inner suburban school, but no change in their outer suburban school relative to the control group. In their "program" phase, they reported a significant increase in AST in experimental schools based on parent surveys after adjusting for confounders, but their child surveys indicated no change in AST after statistical adjustment. Goodman and colleagues [36] reported that children attending a school that had offered the Bikeability program did not cycle more frequently; however, those who actually took part in Bikeability did cycle more frequently, suggesting that parents/children interested in cycling may have self-selected to participate. Quality assessment was conducted with a modified version of the Effective Public Health Practice Project quality assessment tool for quantitative studies (EPHPP, 2003), which is provided in Additional file 1. Following EPHPP guidelines, studies with no weak ratings are rated "strong", studies with one weak rating are rated "moderate" and studies with more than one weak rating are rated "weak". Considering that blinding of participants may not be feasible in the context of AST interventions, global ratings with and without the blinding component of the EPHPP are presented Finally, Hoelscher et al. [39] observed that while intervention schools had higher rates of AST over the 4-year study period, the differences between groups waned over time. Details on the computation of effect sizes (Cohen's d) are provided in Additional file 2. Cohen's d varied markedly across interventions with a range of −0.61 to 0.75. Effect size could not be calculated for five interventions, including two that provided only follow-up data [19,42], and three that provided insufficient data to allow for computation of d [29,39]. Effect size was rated as trivial for 10 interventions [17,20,30,32,34,36,37,43,46,47], small for eight interventions [31,33,39,43,48,50,51], and medium for one intervention [49]. Data from Hinckson et al. [18] indicate a trivial effect size after 1 year of followup, but a medium effect size after 2 or 3 years. Henderson and colleagues' [38] SRTS intervention yielded a medium effect size for the morning trip and a trivial effect size for the afternoon trip. Data from Hunter et al. [39] indicated a medium decrease in AST as estimated with the swipe card methodology, but a small increase for self-reported AST. In Crawford and colleagues' [34] pilot program, there was a small effect size for the inner suburban school and a trivial one for the outer suburban school. In the 3group intervention by Ducheyne et al. [35], there was a small effect size when comparing the intervention and control groups, but a trivial effect size when comparing the intervention + parent (which targeted parents in addition to children) vs. the control group. Finally, data from McMinn et al. [44] suggest a small effect size for changes in minutes of moderate-to-vigorous PA per day, but a moderate effect size for changes in steps/day although both effect sizes were similar (d = 0.46 and 0.52 respectively); however, effect size was trivial for changes in steps and MVPA during the school trip. Table 3 summarizes effect sizes by type of intervention; however, no clear pattern is evident.

Moderators and mediators
Thirteen studies examined potential moderators. Hinckson et al. [17,18] noted that longer follow-up periods, smaller school size, higher school SES, and higher pre-intervention rate of AST predicted higher rates of AST at follow-up. Safe Routes to School interventions using multiple strategies (as defined by the 6P model) achieved larger increases in AST [42,43], and a longer follow-up period was also associated with more substantial increases in AST [43]. In contrast, a short follow-up period was discussed as a potential reason for the lack of a significant mode shift in other interventions [20,46]. Mammen and colleagues [19] reported that parents of older students, those living closer to school and attending urban or suburban schools (relative to rural) were more likely to report "driving less" following the implementation of an STP. Of the potential moderators examined by Stewart et al. [48], only the percentage of students cycling at baseline was negatively associated with changes in cycling. In addition, Mendoza and colleagues' [45] results suggest that greater acculturation, more positive parental self-efficacy and outcome expectations may facilitate children's engagement in AST.
Goodman and colleagues [36] intended to assess children's participation in cycle training as a mediator of the relationship between exposure to the Bikeability program at the school level and children's cycling behavior. However they found a similar frequency of cycling among children exposed and unexposed to the program. No other study described formal mediation analyses.

Discussion
We have provided a comprehensive update on the effectiveness of AST interventions among children and adolescents. Our search strategy identified 27 papers, describing the findings of 30 distinct interventions, which have been published since the previous review [24]. Included interventions were quite diverse and changes in travel behaviors varied markedly across interventions. Included studies suggest that interventions with longer follow-up periods may achieve greater modal shifts. These observations are of particular importance for policymakers and practitioners implementing AST interventions.
Two large SRTS interventions found that interventions including both educational activities and infrastructure changes resulted in greater increases in AST than interventions using only one of these strategies [42,43]. These results are consistent with social-ecological models that posit that behavior is determined by multiple levels of influence including individual, interpersonal, community, policy and built environment factors [52,53].
We noted that few interventions targeted secondary school students. As the correlates of AST may differ by age [54], one should not assume that interventions that are effective among children will work as well with adolescents. Adolescents generally have higher independent mobility [55] and, as such, the influence of parental perceptions on their school travel mode may be weaker. However, adolescents may have less favorable attitudes toward AST [56,57], and this might be a key factor to address for interventions in secondary schools.
In the previous systematic review [24], all studies were rated as "weak" based on the EPHPP tool. In our review, 10% of the studies were rated "moderate" (even with a stricter interpretation of the blinding component of EPHPP) and, when the blinding component was dismissed as unfeasible, 30% of the studies were rated as "moderate" or "strong". This suggests a marginal improvement in study quality over the last 6 years; however the overall quality of evidence as assessed with the GRADE approach remains low. Our sensitivity analysis shows that the blinding component exerted a floor effect on quality scores. Because all interventions received a "weak" rating for blinding, they could not be rated higher than "moderate". Future improvement in quality ratings could be made by controlling for confounders and by using valid and reliable measures of AST, which have been reviewed elsewhere [58]. The calculated effect sizes for most interventions were trivial to small based on Cohen's [28] thresholds. Although these widely-used thresholds are arbitrary, we have used them in the absence of alternative options. Given the large reach of interventions such as SRTS and STP, an effect size labeled as "trivial-to-small" may still be highly relevant from a population health perspective. Interestingly, a pooled intervention effect of d = 0.12 was obtained in a meta-analysis of 30 controlled trials on PA interventions among children and adolescents [59].
Furthermore, while our review focused specifically on the effect of interventions on travel behaviors, some included interventions have documented positive changes in other important outcomes such as children's cycling skills [35], safe street crossing behaviors [37], attitudes toward AST [40], and higher daily PA [44,45]. Substantial reductions in road traffic injuries among children have also been noted following implementation of SRTS [15]. More broadly, it has been proposed that interventions such as SRTS may benefit the larger communities in which they are implemented, and not only children [60].

Mediators and moderators
A better understanding of the mediators and moderators of AST interventions could help identify what works for whom and why [61,62]. Of particular interest, many studies emphasized the importance of having long term follow-ups given that implementation of complex AST interventions may require a substantial amount of time [17-20, 43, 46]. Similarly, qualitative evaluations focusing on the implementation of AST interventions also identify lack of time as a key challenge [63,64]. To address the issue of follow-up length, some authors suggested that granting agencies should be encouraged to provide more long term funding [63,64].
While there has been increased interest in studying moderators of AST interventions, none of the included studies conducted formal mediation analyses and most interventions did not include an explicit theoretical framework. Given the important role of parents in travel mode decision making [65], interventions that increase road safety may be more effective if they also target parents' self-efficacy in allowing their child to engage in AST [45].

Implementation of interventions
Understanding the implementation of complex AST interventions may provide valuable information for the reader to contextualize the effectiveness of such interventions. This may be particularly important for interventions such as SRTS and STP that are essentially evaluated as "natural experiments" [66] because in most cases, exposure to the intervention is not under the control of the investigators. This is a threat to internal validity because the fidelity of implementation varies, but at the same time, it represents more closely how an intervention is implemented in the "real world". Many interventions included in this review reported that implementation varied substantially between schools [19,20,32,34,46], and in some cases, planned changes were not implemented as scheduled [32,37,46]. Crawford and Garrard [34] also reported that the implementation of the Ride2School program was affected by the motivation of school communities. Such challenges and discrepancies may bias our results toward the null hypothesis.
Lack of resources or unequal access to resources has been noted by many authors as a limitation to AST interventions [32,63,64]. In Canada, STPs and WSBs are implemented by non-governmental organizations and lack of support from provincial and federal governments has been identified as a major barrier [64]. In Texas, stakeholders expressed difficulty in navigating the SRTS regulatory process and emphasized that access to SRTS funding was very challenging for low income communities given that no up-front funding was provided [63]. More generally, WSBs typically rely on volunteers which often makes long term sustainability challenging [23,67]. Providing paid WSB leaders may help overcome this issue.

Strengths and limitations
As in the previous review [24], we noted that many included studies did not include a control group. Another limitation is that the original EPHPP tool seems better suited to assess studies where the unit of allocation is the individual. To address this issue, we have modified the tool so that the questions are more relevant to school-based interventions (see Additional file 1). Nevertheless, like other quality assessment tools, the scoring system of the EPHPP is rigid and may not always distinguish more robust studies from weaker ones [68]. For example, in our review, no study reported that outcome assessors were blinded, creating a floor effect whereby no intervention can be rated higher than "moderate". Notwithstanding the importance of blinding in preventing observer bias and Hawthorne effects, a quality assessment tool should be able to discriminate stronger studies from weaker ones. Our sensitivity analysis without the blinding component of the EPHPP intended to address this issue. We acknowledge that the use of a different quality assessment tool could have resulted in different ratings of study quality as observed previously [68]. Finally, the large heterogeneity in the measurement and operationalization of AST precluded meta-analysis. The development of a standard measurement protocol may help address this issue.
The rigorous systematic review process is an important strength of the study. We followed the same search strategy as Chillón and colleagues [24] and computed standardized effect sizes which should help readers interpret the effectiveness of interventions and perform sample size calculations. Finally, the discussion of moderators, mediators and factors related to implementation should help researchers refine current interventions.

Conclusions
The present systematic review highlights the diversity of interventions that have been implemented to promote AST in the last few years, and shows that travel behavior change varied markedly between interventions. Many interventions have shown significant increases in AST, but caution is required in interpretation given the low quality of evidence. This underscores a need for interventions using stronger study designs.
Our findings have implications for researchers and practitioners. First, it may take time for interventions to have an effect on children's travel behaviors. Therefore, follow-ups of at least 2 years should be conducted when possible to minimize the risk of type II error. Second, while many authors indicated that implementation of interventions varied markedly across schools, it is unclear how this variation may influence effectiveness. Hence future research should examine the potential moderating effect of implementation. The fact that some interventions were not implemented as planned suggests that some of the effect sizes reported herein may be conservative. Third, there remains a clear need for investigation of the mediators of travel behavior change.
Only three interventions included some high schools, highlighting a need for more research intervening in secondary school settings. This is important given that the factors associated with AST may differ markedly between children and adolescents. Finally, because some children may live too far from their school, interventions aiming to promote active transportation to/from other destinations such as parks, shops, sport venues, and friends' and relatives' houses may also be warranted [69].