The data used in this analysis were collected at the baseline for the evaluation of an innovative educational program designed to improve nutritional well-being of elementary school children. The program’s development involved technical support from the China Center for Health Education (Beijing), education and public health officials at the provincial level, and school administrators, teachers and parents at the local level. This was a multiyear project. Prior to the school health education program’s implementation all parents received information about the program and signed informed consent documents approving their children’s participation. The city Education Department’s health education professionals collected the data from students in their classrooms before and after the health education program implementation.
Multistage stratified cluster sampling was used for sample selection. First, two economically different provinces were chosen: one economically developed (Shandong) and one economically disadvantaged (Qinghai). Shandong is an east-coast province situated about halfway between Beijing and Shanghai. Shandong is one of the most populous provinces in China, with 95.8 million people, and is the third wealthiest province in China . Qinghai province is in western China next to neighboring Tibet. Qinghai’s population of 5.6 million includes a number of ethnic minority groups and many ethnic autonomous areas. The economy is among the poorest in China .
In each province, cities fall into one of three classes, based on population size. To increase diversity of the sample, schools were chosen from cities of different classes. All students in Grade 3 (ages 8–10 years) in each school were included in the school health education program and evaluation. Each chosen school met three criteria: the school had never conducted a health education project or diet/nutrition-related project before; the total number of students enrolled in the school was over 1000; and leaders of the school and local health bureau approved the project and were willing to cooperate with the research team. The overall project included a baseline survey, an educational intervention, and a post-intervention survey. Only data from the baseline survey were used in this analysis.
The questionnaire that was completed by the students was developed by Chinese epidemiologists and educators, written in simplified Chinese and contained four parts.
Home province, gender, age, parents’ educational level, main person cooking at home, whether living with parents, and frequency of living with grandparents.
Height and weight measures recorded by the school doctor were used to calculate Body Mass Index (BMI = weight/height*height). BMI results were according to the age-specific standards established by the China Ministry of Education . A separate question asked students about their self-image: whether they felt they were obese, overweight, normal weight or underweight.
Nutrition knowledge questions
Questions assessed student’s knowledge about most important staples, fruits, vegetables, refined and unrefined grains, and daily water consumption. Correct answers were scored 1, to provide a score between 0 and 7, with 7 being highest number of correct answers.
Eating behaviors questions
Questions assessed behaviors including frequency of eating breakfast, choice of beverages, staple food for lunch and dinner, vegetable/meat proportions for lunch and dinner, frequency of eating vegetables, fruits and fried foods, and consumption of milk and water. Answers that met pre-established criteria (for example, ate breakfast “every day”) were scored 1, to provide a score from 0 to 9, with a 9 being the highest number of eating behaviors meeting the criteria for health.
Since nutrition knowledge and eating behavior were assessed with criterion reference tests, this precluded reliability estimates [23, 24].
Students in China are accustomed to providing family demographic information, on surveys, tests and exams beginning as early as age 6. As with any survey there is a likelihood of inaccurate information, but in this environment the likelihood is small. The demographic questions included an “I don’t know” option to allow students to indicate not knowing or to opt out of answering.
Cluster analysis was chosen because the socio-demographic variables examined here are known to have high association with each other. We believed other methods, such as regression models, that focus on identification of independent effects and do not take into account the actual associations between the variables would provide misleading results. To analyze highly intercorrelated variables we preferred to examine their combined rather than their singular effects. Cluster analysis examines the joint effects of the variables. The members of each cluster are similar to each other in their characteristics and different from those in the other clusters. This method is especially applicable to socio-demographic variables for findings underlying patterns of commonality that may not be readily apparent from basic descriptive statistics or variable-centered analyses like correlation and regression.
Cluster analyses were performed with the Statistical Package for the Social Sciences (SPSS version 22.0) two-step cluster procedure using categorical variables. For binary categories (e.g., yes-no), categorical clustering maximizes the difference in the binary distributions of the groups. For multinomial categories, categorical clustering differentiates on the basis of response percentage within the response options. SPSS cluster procedure default options of Log-likelihood as the measure for cluster distance for pre-clustering, and the Bayesian Information Criterion (BIC) was the clustering criterion for the clustering step. SPSS automatic cluster determination was used to produce the initial cluster solution. Then alternative cluster solutions around this initial solution were tested for potentially better fits using three criteria: (a) silhouette, which measures how tightly clusters group, with higher scores indicating a better fit; (b) sums of squares within group (SSE), which measures cohesions within clusters, with lower scores indicating a better fit; and (c) sums of squares between group (SSB), which measures separation, with higher scores indicating a better fit. Missing data was handled with list wise deletion.
The cluster analysis was done with socio-demographic variables identified in the literature review [10,11,12,13,14,15,16,17,18,19]: gender, father’s education, mother’s education, whether student lived with parents, frequency of student living with grandparents, main person cooking at home, and self-image. Descriptive statistics highlighted the characteristics of the clusters.
Subsequent Analysis of Variance (ANOVA) was used to compare the differences among clusters in BMI, nutritional knowledge and eating behaviors. The student knowledge scores and behavior scores, which are numeric counts of how many criteria a student met, were treated as continuous ratio-level numbers for the ANOVA.