Study design and approach
A mixed-methods, quasi-experimental design was used to evaluate the SOAR program pilot. Quantitative data collection examined program effects via student surveys pre- and post- intervention and also enabled exploration of the appropriateness of these survey measures to evaluation of this program. The primary outcomes of the student survey were: student reported bystander responses, self-efficacy to intervene, peer pro-social norms, and perceived school climate. Secondary outcomes were: student social and emotional wellbeing, school connectedness, sleep, and experiences of racial discrimination. Quantitative longitudinal measures of change were collected via student questionnaires administered at baseline and follow up. Qualitative data included interviews and focus group discussions exploring experiences of school students and staff participating in the program and student and staff views regarding perceived need for the program, program implementation, program impacts and suggested improvements. The study was pre-registered with the Australian New Zealand Clinical Trials Registry (ACTRN12617000880347).
To recruit schools, emails describing the SOAR program were sent in late 2017 to primary schools in two Australian states (New South Wales (NSW) and Victoria) and information about the study was disseminated via community and education networks. This information described the goals and content of SOAR and included details about how to join the study. Four volunteer schools (two in NSW and two in Victoria) agreed to participate in the study as intervention schools. Two volunteer schools in NSW agreed to participate as comparison schools. A further volunteer school in Victoria initially agreed to participate in the study as an intervention school but withdrew after classroom teachers became aware of the time commitment involved. Data from this school is not included in this present study. All students in Years 5 and 6 (ages 10–12 years) in each of the six participating schools were invited to participate in the study. Ethics approval was obtained from the Australian National University and from each state government education department, with permission obtained from each participating school principal and staff member alongside parent consent and student assent.
Quantitative data collection took place at the beginning of Term 1 in February 2018 and again at the end of Term 2 in August and September 2018 after completion of the SOAR program, in both intervention and comparison schools. (In Australia school starts late January and concludes mid-late December with the year broken into four terms). Students completed online or paper surveys in classrooms supported by SOAR research staff. Each school negotiated their preferred survey mode (online or paper) prior to data collection and all students in each school completed surveys using the same mode. Six hundred and forty-five students completed surveys across the six participating schools, 252 students in comparison schools and 393 students in intervention schools. Survey instruments had previously been used in a large scale population level survey of school students (n = 4664) across NSW and Victoria [11, 22].
Qualitative data collection took place in intervention schools in August and September 2018. Key informant interviews with school staff and focus groups with students were conducted. Ten staff interviews were conducted across the four intervention schools (five in each state). Those interviewed included teachers responsible for delivering the SOAR program, school leaders such as Assistant Principals, affiliated staff such as Anti-Racism Contact Officers and wellbeing teachers. Nine focus groups with students (five in NSW and four in Victoria) were conducted across the four intervention schools. Focus groups ranged from two to eight participants, with most containing six participants and ranged from 18 to 25 min duration. All qualitative data collection occurred on site at the participating schools and was audio-recorded and transcribed.
Students were asked 11 items about their bystander responses adapted from the Participant Role Questionnaire . Items comprised three sub-scales ‘Assistant’, ‘Defender’ and ‘Outsider’. Students were asked when they saw other students treated unfairly because of their race/ethnicity/cultural background how often they engaged in each response (e.g. I joined in, I helped the student being treated badly, I didn’t do anything). Response options were Never/Hardly ever/Sometimes/Most of the time/Always and scored 1–5. Sum scores of items were created for each scale, with high scores indicating a negative outcome for assistant and outsider scales and a positive outcome for defender scales (Cronbach’s α = 0.80 assistant; defender α = 0.90; outsider α = 0.74).
Self-efficacy to intervene
Students were asked four items about their self-efficacy to intervene when they saw other students treated unfairly because of their race/ethnicity/cultural background. They were asked ‘How confident are you that you could… stop it; help them feel better or cheer them up; go to a teacher for help; go to another adult for help.’ Response options were Not at all confident/Not very confident/Neither confident nor unconfident/Confident/Very confident and scored 1–5. A sum score of these items was created with a high score indicating a positive outcome [56,57,58] (Cronbach’s α = 0.80).
Perceived school inter-ethnic climate
Students were asked seven questions about their perceptions of the inter-ethnic climate of their school adapted from the School Interracial Climate Scale . Three questions were about the teacher climate (e.g. Teachers encourage students to make friends with students of different racial/ethnic/cultural backgrounds; Teachers here like students of different ethnic groups to understand each other) and four about the peer climate (e.g. Students are able to make friends with students from different racial/ethnic/cultural backgrounds). Response options were Strongly disagree/Disagree/Neither agree or disagree/Agree/Strongly agree scored as 1–5. A sum score was created each for teacher and peer climate with a high score indicating a more positive climate (Cronbach’s α = 0.76 teacher; Cronbach’s α = 0.46 peer). Attempts to revise the peer climate scale and reduce the number of items using factor analysis did not increase the Cronbach’s alpha to an acceptable level.
Peer pro-social norms
Students responded to five questions about their perceptions of the pro-social norms of their school peers. They were asked ‘How many students at your school… Stand up for students who are made fun of or bullied?; Help other students even if they don’t know them well?; Care about other people’s feelings?; Help stop arguments between other students?; Are nice to everyone, not just their friends?’. Response options were Hardly any/Few/Some/Most/Almost all and scored as 1–5. A sum score of these items was created with a high score indicating a positive outcome  (Cronbach’s α = 0.87).
The Strengths and Difficulties Questionnaire (SDQ) is a brief questionnaire assessing the psychological adjustment of children and youth . The youth-reported (11–17 years) SDQ consists of 25 items across five subscales. We examined total difficulties scale and child strengths in relation to prosocial behavior. The SDQ prosocial scale items reflect student perception of their own prosocial behaviour (I try to be nice to other people, I usually share with others, I am helpful if someone is hurt, I am kind to younger children, I often volunteer to help others). The SDQ is not intended to be used as a diagnostic instrument; rather, it indicates problematic emotions and behaviors across a range from normative to highly elevated . While cut-points have been developed for the SDQ these have not been validated for ethnic minority youth; continuous scores are analyzed herein. The SDQ is the most commonly used measure of child and youth mental health in Australia and has been shown to be psychometrically sound in Australia [63, 64].
Loneliness at school
Five items asked students about their connectedness at school , e.g. ‘I have nobody to talk to; It’s easy for me to make new friends; I can find a friend when I need one’. Response options were Not true at all/Hardly ever true/Sometimes true/True most of the time/True all of the time coded as 1–5. Items were reverse coded as required and a sum score was created with a high score indicating a negative outcome (Cronbach’s α = 0.70).
Students were asked four questions about their teachers’ empathy adapted from the Department of Education and Training Victoria’s Attitudes to School Survey. e.g. ‘My teachers care about me; My teachers are good at dealing with racism when it happens’. Response options were Strongly disagree/Disagree/Neither agree nor disagree/Agree/Strongly agree. A sum score was created with a high score indicating a positive outcome (Cronbach’s α = 0.82).
Five questions were asked about student inter-ethnic contact, e.g. ‘I have participated in cultural events with people from other racial/ethnic/cultural backgrounds’  adapted from . Response options were Strongly disagree/Disagree/Neither agree nor disagree/Agree/Strongly agree and scored as 1–5. A sum score of these items was created with a high score indicating a positive outcome (Cronbach’s α = 0.81).
Sleep duration was self-reported with students asked what time they fall asleep and wake up on a usual school day and on a non-school day. Sleep duration was calculated as the difference between reported sleep time and reported wake-up time, separately for school and non-school days. Sleep latency was measured using a single item ‘During the last four weeks, how long did it usually take for you to fall asleep’. A 3-category analytic variable was created: 0–30, 30–60, > 60 mins . Sleep disruption was measured using a single item ‘During the past four weeks, how often did you awaken during your sleep time and have trouble falling back to sleep again?’ A 3-category analytic variable was created: None/A little, Some/A good bit, Most/All. These items have previously been used with children and adolescents from diverse ethnic backgrounds .
Direct experiences of racial discrimination
Student reports of racial discrimination experiences were measured using 10 items drawn from the Adolescent Discrimination Distress Index  together with two items used previously with diverse Australian school students ; see Additional file 1. Items assessed discrimination by peers at school (4 items), by school personnel (3 items) and by others in the society (5 items). Each discrimination item was then followed by the attribution (‘because of…’) with ‘your race/ethnicity/cultural background’ being one of three non-mutually exclusive options. ‘Culture’ is commonly used to refer to race or ethnicity in Australian community vernacular so was included in the attribution following previous approaches [39, 51]. Frequency of each experience was indicated from 0=‘This did not happen to me’, 1=‘Once or twice’, 2=‘Every few weeks’, 3=‘About once a week’, to 4=‘Several times a week or more’. Following previous approaches  a sum score was created by calculating the mean frequency rating these items (Cronbach’s α = 0.91).
Vicarious racial discrimination
Student reported vicarious discrimination was measured using five items drawn from previous Australian studies; four items assessed discrimination from peers at school and one item assessed discrimination from teachers . Students were asked how often they had seen other students treated unfairly e.g. treated with less respect by other students because of their race/ethnicity/cultural background (4 items), and how often they had seen ‘other students being picked on or treated with less respect by teachers at this school because of their race/ethnicity/cultural background?’ (1 item). Response options were 0 = Never, 1 = Hardly ever, 2 = Sometimes, 3 = Most of the time, and 4 = Always. Following previous approaches  a sum score was created by calculating the mean frequency rating these items (Cronbach’s α = 0.88).
Ethnicity was measured using a self-reported variable with categories developed for the study. Self-reported race/ethnicity is not routinely collected in Australia so a standard classification is not available. Students were able to select multiple categories and an open-ended ‘other’ category was available. Following international approaches , a prioritization method was used to classify multiple responses into mutually exclusive categories based on level of stigmatization in Australia in the following order: Indigenous; Pacific Islander, Maori, Middle Eastern, African, Latinx (non-Asian ethnic minority); Asian; Anglo/European (white) . A ‘Other/missing’ category was also included in the ethnicity analytic variable. We recognize the considerable heterogeneity within each of these groups and do not wish to confuse or conflate Indigenous or ethnic identities while creating analytic categories with sufficient numbers for analysis. Religion was self-reported with the following analytic categories used (no religion, Christian, Islam, other religion (Buddhism, Hinduism, Judaism, other). Gender (female, male, and other) and School Year (5 or 6) were self-reported. Socioeconomic position at the school level was measured using the Index of Community Socio-Educational Advantage, a composite of average parent occupation and education across students, school geographic location, and proportion of Indigenous students .
Across the sample of n = 645 students, 15% had data available for baseline only; 11% had follow up data only. These missing data are likely related to student absenteeism on data collection days given the surveys were completed as a one-off during classroom time. Across the n = 645 sample, bystander response measures had the highest levels of missingness (10% no data) with other outcomes having far less missing (e.g. self-efficacy to intervene 2%; peer prosocial norms 4%; school climate 1%; discrimination 1%).
Qualitative data collection entailed semi-structured interviews with classroom teachers and school leaders and semi-structured focus groups with students. Interviews and focus groups were guided by topic guides that explored staff and student experiences of the program including perceived need for the program, program implementation, program impacts and suggested improvements. Open-ended questions aimed to capture in-depth descriptions of the experiences and impacts of the project on staff, students and schools.
Demographic characteristics of the sample by individual school and by study arm (comparison and intervention) were described. Multilevel mixed effect models were used for analysis of pre- and post- survey findings to account for the non-independence in the data, in our case paired data on individual students within a school. These models also enabled the analysis to make use of all data available, that is participants with baseline or follow up data only as well as those with data at both timepoints. A linear mixed model was fitted for each of the continuous outcomes and a logistic mixed model was fitted for each of the categorical outcomes. This study is both a pre-post study design and control/ intervention study, therefore in the models an interaction term is included. The interaction term is included in the model to quantify the effect of the intervention, as this represents the difference between study group (intervention compared to control) and the time period (post vs pre). A statistical difference is quantified as a p value of less than 0.05.
Following transcription of the recordings, interviews and focus groups were thematically coded and analysed using NVivo. The coding framework was developed using a combination of theoretically-driven (a priori) and inductively from the data. The data was analysed under ten thematic nodes. Staff and student data were analysed separately. Multiple researchers conducted coding and discussed findings iteratively with each other during the analysis process, as well as with the wider research team, to triangulate findings, enhance reflexivity and reduce the risk of bias.