Study setting and design
The sampling frame comprises of tea plantations belonging to one regional plantation company (RPC). The RPC managed a total of 12 tea plantations, five of which ran the midday meals program as of June 2015. The program was planned to be rolled out across the other seven plantations in due course. All plantations were located in close proximity within the Thalawakale region in Sri Lanka. This setup can be considered as a natural experiment where children from plantations with and without the midday meals program are compared. Longitudinal data was collected on the heights and weights of all children below the age of 5 years living in sampled plantations. Available demographic data was also collected for the sampled children together with institutional level data. Sample sizes required for the treatment and control groups were calculated following standard methodology for sample size determination in field surveys [18, 19]. The formula, \( N=\frac{p\left(100-p\right){z}^2}{E^2} \) is used where E is set at 5%, z at 95 and 50% coverage of the program within the sampling frame is assumed. Simple Random Sampling using the Order Sampling approach was used to pick the sample of treatment and control plantations [20]. The sampling procedure resulted in 3 plantations (2 treatment and 1 control) being selected with a sample size of 799 children in the treatment group and 480 in the control group. The quantitative survey was conducted in June 2015.
The main source of anthropometric data and program attendance data was through midwife and CDC records. Consent for the provision of this data was obtained from plantation midwives and the child development officers (CDO) who head each CDC. Midwives and CDOs were also interviewed in order to collect institutional level data, information on the general health status of plantation children and details of the implementation and management of the midday meals programs. A housing quality checklist questionnaire was used to assess the general housing quality in each of the visited plantations, using a convenience sample of households.
Ethics approval and data quality procedures
The necessary ethics approvals were obtained from the Monash University Human Research Ethics Committee (MUHREC) and the Plantations Human Development Trust - the main government authority overseeing human development within the tea industry in Sri Lanka. Approvals were also obtained from the management of the charitable foundation implementing the midday meals program and the regional plantation company managing the plantations.
As participants, written informed consent was obtained from all midwives and CDOs interviewed. Parental consent was not required by MUHREC or by the Sri Lankan authorities, as children were not directly interviewed during the survey. The study relies on data routinely collected and maintained by midwives and CDOs for health monitoring purposes. Complying with their approval and privacy protocols, this data on children’s weight and height measures was released by them in an appropriate non-identifiable format.
Data
Anthropometric data for all children below the age of 5, living in sampled plantations was collected for the period January 2013–June 2015. Data comprised of an unbalanced panel of body weights (kg) and heights (cm) for children recorded by the estate midwife or CDOs. Weights were recorded monthly while the heights were recorded quarterly. Since data was only available for the period January 2013–June 2015, early-age weight and height measures for older children (3–5 year olds as of 2015) were not available. Each child’s panel started with the first available weight measure for the child. Possible impacts of this truncation of sample data is investigated as part of the robustness analysis. To match the monthly frequency of weight measures, missing monthly heights were imputed using cubic spline interpolation. Children’s weight-for-height/length was calculated following the conventional method [21].
Z-scores for the weight-for-age (WAZ), height-for-age (HAZ), and weight-for-height/length (WHZ) were obtained following WHO methodology [22], using their igrowup macros. Following WHO definitions for underweight, stunting and wasting [23], binary variables were then constructed to indicate if a child was underweight, stunted or wasted in any given time period, by considering whether WAZ, HAZ and WHZ for each child in each time period falls below − 2.00 respectively. These binary variables, together with the z-scores, are used as dependent variables in models.
Given the primary intention of the study is to provide insights that have relevance to other similar programs, it is important to use the globally accepted WHO methodology when constructing the dependent variables in the models [21]. There are a couple of relevant control variables where it is more appropriate to standardise using sample medians and standard deviations, as their purpose is to control for within-sample differences between children prior to the treatment. These are the standardized first weight/height measures for each child and their birthweight.
Other child level variables collected included date-of-birth, gender, birthweight and ethnic group. A CDC level variable indicating the physical condition of the CDC (old, upgraded or new) is also included in models. The treatment variable used in modelling is a time-invariant binary variable (Trt) identifying whether a child lived in a treatment or control plantation. Hence Trt identifies access to the midday meals program.
Empirical strategy
Identification of the treatment effect relies on the staggered introduction of the midday meals program across similar tea plantations in similar localities and under the same management. The broad assumption here is that the plantations belonging to the treatment and control groups are generally similar – an assumption whose validity will be critically evaluated. Validity is also supported by relying on random selection of the specific plantations to include in the sample, from the populations of treatment and control plantations.
The following model is used to model the treatment effect:
$$ {H}_{i,t}=\alpha +{\beta}_1{Trt}_i+{\beta}_2{X}_{i,t}+{\beta}_3{Z}_i+{\varepsilon}_{i,t} $$
(1)
The subscripts i and t refer to child i at age t (in months). H represents a set of dependent variables measuring child growth - the standardized weight-for-age (WAZ), height-for-age (HAZ), weight-for-height/length (WHZ), and binary variables underweight, stunting and wasting. The Vectors X and Z include time-varying and time-invariant child-level variables respectively, while Trt is a binary variable indicating whether a child belonged to the treatment or control group. We assume all children in a treatment plantation have access to the midday meals program irrespective of the employment status of their parents/guardian. This assumption is deemed valid given the low barriers to entry to plantation employment throughout the year. Models for WAZ, HAZ and WHZ are fitted using a random-effects instrumental variable specification while models for underweight, stunting and wasting are fitted using random-effects logistic specification. All models fit robust standard errors.
Different variants of the above model were also fitted, to establish the robustness of the results. These include modelling by gender, birthweight and birth-year cohort, also helping explore equity issues, and using different estimation strategies (e.g. Generalized Least Squares Random Effects).
The limited number of observed demographic variables could mean there is a risk of omitted variable bias. Whilst a fixed-effects specification could have dealt with the issue of unobserved child-level variables, the time-invariant treatment variable does not allow for the use of fixed-effects. Therefore, it is important to address a number of specific concerns regarding the identification strategy.
Possible endogeneity of treatment variable
The treatment has been allocated across the tea plantations within the sampling frame based on managerial decisions about the order of roll-out of the program across plantations. Management report no systematic factors that influenced the decision about order, other than practical / administrative matters. However, if there are systematic unobserved differences between the treatment and control groups that are correlated with child development outcomes, this risks creating an endogeneity bias in model estimates. Baseline data can be used to check for similarity between treatment and control plantations. Since the midday meals intervention applies to children aged at least 6 months, we compare weight records of children up to 6 months of age across the treatment and control samples. Any differences cannot be attributed to the program, and instead could indicate relevant difference between the two groups.
The histograms of weight-for-age zscores (WAZ) in the 0–6 month age category across the treatment and control groups, together with the standard normal distribution curve, are presented in Fig. 1. According to this, the distribution of WAZ appear to be closer to the standard normal distribution within the control group than the treatment group. The two-sample Kolmogorov-Smirnov test for equality of distribution functions, rejects the null hypothesis of equality in distribution between the treatment and control groups (Combined K-S: 0.22, p < 0.000). These results suggest that children in the treatment group may be starting at a relative growth disadvantage compared to children in the control group, at the baseline.
A number of methods are used to overcome potential endogeneity of the treatment variable, the most significant of which is to control for measures of past growth, using a lagged dependent variable (Hi, t − 1 in eq. (1)) as a control variable in the models. We also use other measures of past growth (birthweight, first recorded weight and height) as control variables. Proxy variables are also included to control for important unobserved time-invariant factors, where applicable. Together these measures should account for any significant time-invariant factors that may drive systematic differences between the treatment and control samples with respect to the growth of children. In addition to this we also include proxy variables to control for possible time-varying factors which could also cause systematic differences between the treatment and control groups. Added together, these methods provide reasonable confidence that we are capturing a causal effect with the treatment variable, having overcome any possible significant bias in our identification strategy. We next give details of the main relevant unobserved time-invariant and time-varying factors whose impacts are accounted for through this analytical approach.
Time-invariant unobserved variables
The first characteristics to consider are the general living standards and facilities in the estates. Systematic differences in these characteristics across treatment and control groups could, if not measured, find their way into the model error, and be correlated with the Treatment indicator variable, resulting in an endogenous treatment variable. In addressing this concern, a housing quality checklist was used to collect data on the average quality of housing within each plantation, and Principal Components [22] was used to construct a housing quality index. The housing quality index was statistically significant and positive in the weight-for-age model but not significant in height-for-age and weight-for-height models. Importantly, including this index in the model had virtually no impact on the estimated treatment effects.
Household income and food security are also not observed in our data, and these could also vary with the treatment and outcome variables, leading to an endogeneity risk. With regards to household income, most plantation families are employed within the tea industry [8, 24]. Daily wages paid to plantation workers are well regulated, consistently low and stagnant over time [8], and there is limited access to other forms of income through non-plantation work [8, 24]. This suggests that the average household income would not vary much over time and across the three study plantations. Given the stagnant nature of income over a period of just a few years, using past measures of growth (i.e. the prior period’s growth, first weight/height measure on record) would account for any time-invariant impact of household income on the growth of children.
We also consider food security as another possibly important unobservable. There is likely to be a strong link between household income and food security [25], so the arguments above regarding income would also apply to food security. The exception is the possibility that socio-cultural factors could also impact food security due to culturally induced food preferences or aversions. However, given that over 80% of plantation sector residents are from an Indian-Tamil ethnic background [24], it is reasonable to assume homogeneity in food preferences across the sampled plantations.
Family economic background and parental education are also unobserved household level variables which tend to be fixed over time [26]. Therefore, it is reasonable to assume that whilst these variables would impact child growth, they would not cause any systematic differences between treatment and control groups. Nevertheless, it is important to control for as many of these effects as possible. Birthweight is an ideal of a relevant proxy for the general socio-economic background of children as it impacts their growth. Apart from the rare occasion of a genetic/medical complication which impacts a child’s in utero growth, birthweight is usually a consequence of the nutrients that the mother receives during pregnancy. Past studies have established the close relationship between maternal and child nutrition, including in the plantation sector of Sri Lanka [27]. Therefore, we use birthweight as a proxy for the general economic background of the child’s family, especially around the time of birth.
Potential time-varying unobserved variables
The level of parental care and general child-care practices at home can have a significant impact on the growth of children, particularly considering the female dominant labour structure and the relatively large household sizes in tea plantations [25]. Even though there is little evidence to suggest that parental care would differ significantly between treatment and control plantations, not accounting for parental care received by the child may potentially bias our results. Therefore, it is important to control for this effect. We do so with a proxy based on accessing maternal health services. In Sri Lanka, parents of children living in identified vulnerable communities are strongly advised to visit a midwife for check-ups on a monthly basis. Regularity of monthly midwife visits would provide a proxy for the level of parental/guardian care and attention received by the child. A mother who attends all or most check-ups is likely to have a higher level of commitment to parental care than one who has erratic attendances. The proxy variable pcarei t comprises the proportion of midwife/clinic visits completed compared to the number available, for each child i in each time period (months) t.
Inter-plantation migration due to midday meals program
A final methodological issue is the potential for plantation families to relocate between treatment and control plantations, based on the availability of the midday meals program. If this occurs, it is a possible source of bias in estimating treatment effects. In practice, this is highly unlikely, given the generally landless status of plantation residents [24]. Plantation residents are provided housing and other amenities through the plantation management and their right to the homes they live in is partly determined by several generations of a family living and working within the same plantation [8, 24]. There is virtually no freedom to relocate between plantations.
Sample truncation
Given the time window in which anthropometric measures were collected (i.e. January 2013–June 2015), non-random sample truncation is a possible concern in the study, as the window of observation does not allow for observing early measures of weights and heights for older children in the sample. We include the child’s birth year and age as independent variables in order to account for this effect. Further robustness tests are also carried out to determine the effects of non-random truncation/ attrition using sample attrition weights in models [28].
Statistical analysis
De-identified height/weight data provided by the midwives and CDOs were entered to Microsoft Excel 2013 together with other child and CDC level variables. Data was then imported to STATA SE 15 and the dependent variables were constructed using the WHO igrowup macro.
Instrumental variables
The model in eq. (1) is fitted to standardized weight-for-age (WAZ), height-for-age (HAZ) and weight-for-height/length (WHZ). As indicated, past measures of growth (i.e. lagged WAZ, HAZ and WHZ etc.) are used as controls, to account for some of the unobserved time-invariant factors such as household income and food security. However, in WAZ, HAZ and WHZ models, this poses an additional analytical issue, as the lagged dependent variable becomes an intermediate variable in the causal pathway between the treatment and dependent variables. An intermediate variable can be defined as a variable which forms part of the causal pathway in the relationship between a treatment and an outcome. As a result, part of the treatment effect may be masked by the intermediate variable. In addition to this, there may also be other unobserved variables which impact both the present and previous period’s growth which could also bias results. For these reasons, we use an Instrument Variable Random Effects (IV RE) estimation method which instruments for the lagged dependent variable in WAZ, HAZ and WHZ models.
The instrumenting strategy involves a first-stage equation of the form:
$$ {H}_{i,t-1}={\delta}_1+{\delta}_2 IV+{\delta}_3{X}_{i,t}+{\delta}_4{Z}_i+{u}_{i,t} $$
(2)
where IV refers to the set of instruments used in each model and the vectors X and Z represent the time variant and time invariant variables included in eq. (1). H represents the lagged WAZ, HAZ or WHZ. Birthweight and the lagged parental care variable (pcarei t-1) are used to instrument for WAZi,t-1 in eq. (2). Given the sample comprises of children below the age of 5 years, it is reasonable to assume that the birthweight would be correlated with the WAZ in early years, an assumption confirmed by the analysis. It is also reasonable to assume that the relationship between birthweight and WAZi,t will occur via WAZi,t-1. Therefore, birthweight can be considered as a valid instrument for WAZi,t-1 in the model.
Considering the lagged parental care variable (pcarei t-1), this variable reflects the general history of care received by children. Therefore, pcarei t-1 will also be strongly correlated with WAZi,t-1. In addition to this it is again reasonable to assume that the relationship between the lagged parental care variable (pcarei t-1) and the current periods weight-for-age will manifest through the previous months weight-for-age. Therefore, pcarei t-1 can also be considered as a valid instrument for the lagged dependent variable in the weight-for-age model.
With regards to the height-for-age models, pcarei t-1 is again used as a valid instrument for HAZi,t-1. A standardized value of the first height on record (firstzheight) is also used as an instrument. Birthweight, standardized value of first height (firstzheight) and the lagged parental care variable are all used as instruments for the previous period’s weight-for-height/length (WHZi,t-1) in the weight-for-height/length models.
The validity of the instruments is assessed using the Kleibergen-Paap rk test (for under-identification), Cragg-Donald Wald F Test (weak instrument) and Sargan-Hansen test (validity of over-identification restrictions). Test results indicate that the instruments are valid in most of the main models. However, the validity of instruments cannot be confirmed in some of the subsample models, likely due to smaller sample sizes. Generalized Least Squares Random Effects (GLS RE) models are fitted in these cases, as a robustness test.
For modelling the binary variables underweight, stunting and wasting a Logit Random Effects estimation is used together with lagged values of WAZ, HAZ and WHZ as control variables respectively. An instrument variable approach is not necessary in this case as these control variables are not direct intermediate variables in these models.