Assessment of data quality in a multi-centre cross-sectional study of participation and quality of life of children with cerebral palsy

Background SPARCLE is a cross-sectional survey in nine European regions, examining the relationship of the environment of children with cerebral palsy to their participation and quality of life. The objective of this report is to assess data quality, in particular heterogeneity between regions, family and item non-response and potential for bias. Methods 1,174 children aged 8–12 years were selected from eight population-based registers of children with cerebral palsy; one further centre recruited 75 children from multiple sources. Families were visited by trained researchers who administered psychometric questionnaires. Logistic regression was used to assess factors related to family non-response and self-completion of questionnaires by children. Results 431/1,174 (37%) families identified from registers did not respond: 146 (12%) were not traced; of the 1,028 traced families, 250 (24%) declined to participate and 35 (3%) were not approached. Families whose disabled children could walk unaided were more likely to decline to participate. 818 children entered the study of which 500 (61%) self-reported their quality of life; children with low IQ, seizures or inability to walk were less likely to self-report. There was substantial heterogeneity between regions in response rates and socio-demographic characteristics of families but not in age or gender of children. Item non-response was 2% for children and ranged from 0.4% to 5% for questionnaires completed by parents. Conclusion While the proportion of untraced families was higher than in similar surveys, the refusal rate was comparable. To reduce bias, all analyses should allow for region, walking ability, age and socio-demographic characteristics. The 75 children in the region without a population based register are unlikely to introduce bias.

Background SPARCLE [1] is a cross-sectional survey in nine European regions which examines the relationship between the physical, social and attitudinal environment of children with cerebral palsy and their participation in everyday activities and quality of life. Its analyses will take account of family stress and child behaviour. Children with cerebral palsy were studied because they often have impairments of learning, hearing, vision, communication and epilepsy in addition to motor impairment and so are more representative of disabled children than a group with for example just epilepsy or movement difficulties. Importantly, there are also population registers of such children, reducing the risk of selecting a biased study population. Children aged 8-12 years were studied because they have been investigated much less than preschool children, are able to report their own Quality of Life and have not entered adolescence where additional factors operate. The study is original in its methods by directly engaging children themselves, where possible, and including those with communication difficulties and severe learning difficulties even if they cannot self-report.
The aim of this paper is to assess data quality and potential for bias; the specific objectives were to assess (i) heterogeneity between the nine regions, in particular whether inclusion of children from one region who were not identified from a population-based register is likely to have introduced bias, (ii) whether family non-response is likely to have led to a biased sample, (iii) how children who self-complete differ from other children, and (iv) the extent of item non-response.

Methods
The study methods have been described in detail elsewhere [1] and are summarised briefly below.

Sample selection
Participants were selected from eight registers of children with cerebral palsy resident in defined regions: North England, West Sweden, Northern Ireland, South East France, South West Ireland, East Denmark, Central Italy, South West France (see Figure 1 : Map of former European Union (between 1/1/1995 and 1 /5/2004) showing regions included in SPARCLE study) [2]. A further region in North West Germany joined the study but its sample could not be drawn from a population-based register and was constructed from referrals from multiple sources.
Milder cerebral palsy is more common so, in regions with sufficient registered cases (North England and East Denmark), similar numbers of children were sampled at each level of severity: children were grouped by walking ability and a random sample selected from each stratum as shown in Table 1. West Sweden and Northern Ireland selected a random sample of children in the highest strata of walking ability and approached all children in the remaining strata. In regions with fewer registered cases, all children were approached.
Children with dates of birth between 31/07/1991 and 01/ 04/97 inclusive were eligible. Families were interviewed between May 2004 and August 2005 when, if possible, the child was over eight and under thirteen years. The children initially selected comprised the initial sample. As several attempts were made to contact eventual nonresponders, often over a long period of time, they were assigned a notional interview date of 1st January 2005. Children were excluded if they were born outside the specified dates of birth and they were also over six months outside the specified age range on the interview date, or if they had died, been misdiagnosed, moved out of the area, or their parents had language difficulties. The remaining children comprised the final sample.

Interview of children and parents
Researchers were trained together before visiting the selected families at home to administer questionnaires to parents and, if possible, children. Background information on the socio-demographic characteristics of the families and the impairment of the children was recorded. Children's ability to report their own Quality of life was assessed [3,4]. If possible, their self-reported Quality of life was captured using the KIDSCREEN questionnaire [5], a modern instrument with excellent psychometric properties assessing children's quality of life on ten dimensions, whose questions were derived from focus group work with children across Europe. Two additional questions, similar to questions in the Child Health Questionnaire [6], captured children's experience of pain. Parental reports of children's Quality of life, participation, frequency of participation in discretionary activities, environment, general health and behaviour were recorded using the KIDSCREEN [5], Life-H [7], Frequency of Participation [1], European Child Environment [8][9][10], Child Health [6], and Strengths and Difficulties [11] questionnaires respectively. Additionally, parent reports of their children's pain were recorded if the children were unable to report their own pain. Parental stress was recorded using the Parental Stress Index -Short Form questionnaire [12]. Participation and Quality of life were outcomes; environment was the main exposure; impairment, pain, socio-demographic characteristics, frequency of participation in discretionary activities, the child's behaviour, and parental stress were regarded as potential effect modifiers.

Heterogeneity between regions: socio-demographic characteristics and family non-response
We assessed heterogeneity of socio-demographic characteristics and non-response between regions. For all regions, we considered: parental educational qualifications; employment; area of domicile; type of school attended by child (these were dichotomised respectively as minimal school-leaving qualification or lower/other; at least one parent working full-time or two working parttime/other; urban/rural; mainstream/special unit in mainstream school or special school). We also considered rates of completion of questionnaires by children. For all regions except North West Germany, where children were not selected from a population-based register, we considered non-response, both overall and due to: (i) failure to trace the family (non-traceability), (ii) traced families declining to participate (refusal), and (iii) traced families not being approached by researchers (non-approach). The significance of heterogeneity between regions was assessed using chi-squared tests. The regions contributing to any significant heterogeneity were identified using a comparative risk function (CRF) [13], which compares the odds of children in a specific region being in a specific category with the corresponding odds for the entire sample. This method has the advantage of comparing each region with the entire sample and yielding confidence intervals which are independent, whereas logistic regres-  Percentages are relative to number in: * study population; ** initial sample; † final sample. a One child was excluded as the walking ability was not recorded. b Born outwith specified dates of birth and over 6 months outside specified age range on interview date.
sion compares each region with one which is arbitrarily selected as the reference category and yields confidence intervals which are correlated.

Predictors of family non-response
We used logistic regression to assess whether non-traceability and refusal were related to age, gender and level of impairment (walking ability, presence of seizures, impairment of vision and hearing) recorded by the register at age 4-5 years. We assessed whether any such associations could explain the differences in response rates between regions [14]; odds ratios (ORs) and 95% confidence intervals (CI) are reported.
For each region, the method of approach to families was classified as direct (initial contact with the family was by the researcher) or indirect (initial contact was by a doctor, social worker or therapist) -see Table 2. Logistic regression was then used to assess whether non-traceability or refusal was related to the method of approach.
We calculated weights which allowed for (i) the sampling strategy and (ii) both the sampling strategy and nonresponse. Use of weights usually reduces bias but increases the variance of estimates. We estimated the probable loss of efficiency (percentage increase in variance) which would be induced by using weights when they are not necessary to reduce bias [15].

Characteristics of children not recorded by registers and ineligible children
We assigned weights which allowed for the sampling strategy to children selected from population-based registers, so that the weighted observations represented the characteristics of children on the registers who would have been included in the study in the absence of sampling. We then compared the age, gender and level of impairment as assessed at interview, of children in North West Germany with that of children in the weighted sample [15]. Logistic regression was also used to assess whether the few included children who did not meet the inclusion criteria differed from eligible respondents.

Characteristics of self-completing children
Children with learning difficulties were expected to have difficulty in completing the KIDSCREEN questionnaire themselves. We therefore used univariate logistic regression to examine how the proportion of children completing the KIDSCREEN questionnaire varied with the children's IQ. We then assessed whether self-completion depended on age, gender and level of impairment as assessed at interview, after allowing for IQ. Finally, we assessed whether the various aspects of impairment were independent predictors of self-completion by removing each one in turn from a multivariable logistic regression and referring to the likelihood ratio test statistic.

Item non-response and missing scores
For parents and children who responded, we calculated the proportion of items not completed on each questionnaire. Many of the questionnaires are scored using algorithms which allow calculation of domain scores even if responses to a few items are missing. Therefore we also calculated the proportion of missing scores for each questionnaire.
To mitigate the effects of multiple analyses, a p-value < 0.01 was regarded as significant in all analyses. Statistical analyses were performed using Stata 9 [16].

Consent and Ethics approval
The research was carried out in compliance with the Helsinki Declaration. All parents gave written consent. All children with sufficient cognitive capacity gave written consent or communicated consent if unable to write. Ethics committee approvals were obtained in each country.

Results
The study population, initial sample selected, reasons for exclusion after sampling and final sample are summarised in Table 1. The final sample comprised 1,249 children: 1,174 children selected from the registers, plus 75 children recruited by North West Germany. Due to administrative error, 27 of these children (24 responders and 3 non-responders) were born outside the specified time period and 40 children were interviewed when their age was up to six months outside the specified age range.

Non-response of families selected from population-based registers
Of the 1,174 included families, 431 (37%) did not respond: 146 (12%) could not be traced; of the 1,028 traced families, 250 (24%) declined to participate and 35 (3%) were not approached by researchers -11 because professionals requested this, 6 because foster parents cared for them, 6 because the children reached 13 years before the family was approached, and 12 for other reasons. There was significant heterogeneity between regions in rates of non-response, non-traceability and refusal (p < 0.001). Table 3, section (a), shows that compared to all included families, West Sweden and Northern Ireland had a lower response, largely because families who were approached often declined to participate, whereas South West Ireland, East Denmark and Central Italy had a higher response because of a combination of effective tracing of families and a high level of co-operation among those who were traced. Decisions not to approach traced families also appeared to be influenced by regional factors: 10 of the 11 families which professionals requested should not be approached were in one region and 5 of the 6 children in foster care were in another region.
Rates of tracing families did not vary significantly (p > 0.01) with any of the variables considered (see Table 4). Among traced families, those whose children walked using aids or could not walk were significantly less likely to refuse; (OR for refusal compared to children who walked without aids = 0.5, 95%CI: 0.4 to 0.8 and 0.6, 95%CI: 0.4 to 0.9 respectively). The gender, epileptic status, vision and hearing of the disabled child were not significantly associated with refusal. The significant heterogeneity between regions in rates of refusal remained after adjusting for walking ability (p < 0.001).
The five regions which used a direct approach had similar rates of non-traceability to the three regions which used an indirect approach (average rates of 13% and 12% respectively), both unadjusted (see Table 4) and after adjusting for age. Likewise, the direct and indirect approaches resulted in similar rates of refusal (24% and 25% respectively), both unadjusted and after adjusting for age and walking ability. As recent events in Sweden (discussed below) [17] may have led to the high refusal rate (52%) there, we repeated these comparisons excluding Sweden and found a lower refusal rate (12%) in the four regions which used a direct approach (OR for refusal comparing indirect with direct approaches = 2.4, 95%CI: 1.7 to 3.5). Findings were similar when overall non-response of traced families (both refusal and non-approach) was considered.
The estimated percentage increases in variance induced by using sampling weights to allow for the sampling strategy and, additionally, non-response, were 22% and 32% respectively.

Socio-demographic characteristics of all included children and families
Among the 818 responders in the final sample, the mean age of the children on date of interview was 10 1/2 years; 59% were boys and 41% girls. The distribution of age and gender was similar across regions.

Characteristics of children not identified from registers and ineligible children
Weighted logistic regression indicated that age, gender and levels of impairment, as assessed at interview, among children recruited in North West Germany were similar to those of eligible children selected from the populationbased registers.
24 (3%) of the children of included responders were not strictly eligible as they were born outside the specified time period. Logistic regression, comparing these ineligible children with eligible children, showed no indication that they differed in terms of impairment, as assessed at interview, or gender.
Completion by children 500/818 (61%) children completed the KIDSCREEN questionnaire, with some variation in self-completion rates between regions as shown in Table 3, section (c). Children with IQ less than 70 were much less likely than other children to complete the questionnaires. Comple-  tion rates did not vary significantly with the children's age or gender. Children who were unable to walk and those with seizures, impaired hearing or vision were less likely to complete the questionnaires (see Table 4); but only inability to walk, seizures and vision remained significant after allowing for IQ and only IQ, inability to walk and seizures were independently significant (p < 0.01).

Item non-response
Age and gender were reported for all 818 children included in the final sample. Only eight children (0.4%) had missing data on impairment items. For questionnaires administered to parents, the proportion of items not completed ranged from 0.4% for the impairment questionnaire to 5% for the KIDSCREEN questionnaire. Children failed to complete 2% of items (see Table 5).

Family non-response
The overall rate of non-response among families was 37%: 12% non-traceability and 24% refusal and 3% nonapproach among those traced. These non-response rates should be evaluated by comparison with surveys of a sim-  ilar design, which target specific families and conduct face-to-face interviews. A recent British national survey of the mental health of over 12,000 children sampled specific families and conducted face-to-face interviews with parents and children [18]. This survey reported that 5% of families could not be traced and 28% of those traced either opted-out or refused an interview or were not approached. Hence its refusal rate for families was similar to that of SPARCLE, but it traced a higher proportion of families, probably because its sample was drawn from child benefit records, so the families had an incentive to keep their address records up-to-date.
The identification of cases from population-based registers which ascertain cases from multiple sources mitigates against selection bias but cannot reduce non-response bias. Although we wished to compare the level of nonresponse in our study with that in similar epidemiological studies which identified children from population-based registers and administered face-to-face interviews, we were unable to find reports of any such studies. Hence we are unable to comment on whether our findings are typical of such register-based studies.
Many national surveys of households conducted in the 1990s reported lower non-contact rates (typically 2% to 8%) and lower refusal rates (typically 2% to 29%) than SPARCLE [19,20]. These surveys usually attempted to contact residents at specific addresses whereas SPARCLE attempted to trace specific families. Further, they were often conducted by telephone or mail and some allowed proxy respondents within households. All these factors would probably have led to higher response rates.
Families whose children could walk without aids were less likely to agree to participate. Refusal is influenced by indi-vidual, household and societal factors: men, people with lower educational qualifications, those from ethnic minorities and those living in urban areas are often less likely to co-operate [19,20]. We were unable to record these factors for non-respondents in SPARCLE.
Rates of non-traceability and refusal varied significantly between regions. This is not surprising as the method of approach varied substantially between regions, reflecting variation not only in national legislation interpreting the European directive on data privacy [21,22], but also in implementation of legislation in different regions. Overall, rates of non-traceability and refusal were similar regions which approached families directly and indirectly. However, the level of refusal in Sweden, where families were approached directly, was twice that in other regions; this was almost certainly due to recent controversial events when a judicial judgement demanded release of patient records [17]. Excluding Sweden, regions which approached families directly had about half the refusal rate of other regions. This finding should be interpreted cautiously, firstly because it follows exclusion of a region after noting its unusual data and, secondly, because it is based on comparison of only seven remaining regions.

Possible bias
Results of analyses could be subject to bias if the sampling strategy is not taken into account, or if non-response or inclusion of unrepresentative children resulted in a biased sample.
It is possible to adjust for the sampling strategy either by using sampling weights or by adjusting for the design variables (region and walking ability) [15]. If it is assumed that non-response depends only on observed variablesand we have shown that it is related to region, age and walking ability -similar strategies can be used to allow for the effects of non-response. Sampling weights generally reduce bias but increase the variance of estimates. Allowing for the sampling strategy and non-response by using weights would probably increase the variance of estimates by about a third; therefore adjustment for region, walking ability and age may be preferred.
The age, gender and levels of impairment of children in North West Germany, who were identified from referrals from multiple sources, were similar to those of children recorded on population-based registers. Children who were ineligible because their date of birth was outside the specified range were similar to other included children in terms of gender and impairment. Therefore it is unlikely that inclusion of these children has introduced any substantial bias.
The proportion of missing items and scores among respondents was low. The correlates and possible effects of these missing data will be evaluated in reports considering each questionnaire.

Heterogeneity between regions
The socio-demographic characteristics of the families varied substantially between regions. Census data show that, in North East England and Northern Ireland, a high proportion of families with dependent children have no working adults [23] and a high proportion of adults have no educational qualifications [24,25]. Therefore it is not surprising that the SPARCLE samples from North England (where the majority of the population live in the North East part) and Northern Ireland compared unfavourably with other European regions in these respects. The high proportion of families with at least one parent working in South West France is probably due to a combination of factors: this region had the highest proportion of included children with good walking ability (47%) and a high proportion of mothers with good educational qualifications who are more likely to work; it is also very well equipped in terms of institutional care for handicapped children [26]. The higher proportion of parents in Denmark and Sweden who had qualifications above the minimum probably reflects the high level of participation in jobrelated continuing education in those countries [27]. The type of school attended by the child also varied substantially between regions, partly reflecting variation in national policies [28].
This heterogeneity between regions is a strength of the study, giving greater statistical power to examine the effects of factors which, if improved, would yield the greatest benefit to disabled children and their families. It was not the purpose of the SPARCLE study to draw conclusions about differences between regions or countries and, indeed, it would be misleading to do so, because some of the apparent differences reported in Table 3 may be chance findings consequent to multiple analyses and some may be due to the effects of differences between regions in socio-demographic characteristics. As other differences between regions, not recorded by the study, are highly likely, it is important to adjust for region in all analyses.

Children's self-report
Although we attempted to use objective criteria [3,4] to indicate whether a child could self-report, interviewers reported that some children who satisfied the criteria did not always understand the questions and their self reports were therefore excluded. Nevertheless, self-completion rates were similar between regions. As expected, children with lower IQ were less likely to self-complete. However, somewhat surprisingly, after allowing for IQ, children who were unable to walk and those with seizures were less likely to self-complete.

Comparison with other studies
While a few regional and national studies evaluate whether non-response is likely to have introduced bias [24,29], and some studies report the extent of nonresponse without evaluation of its effects [18,[30][31][32], most small studies of disabled children which we identified did neither.
We did not find any evaluations of differences in response rates between surveys which used direct and indirect approaches to potential participants. As ethical requirements concerning recruitment of participants in clinical studies are now influenced by recent European directives on data privacy [21] and good clinical practice [33], surveys may be more frequently required to approach potential participants indirectly. The consequent effect on response rates is of considerable interest.

Conclusion
While refusal rates were comparable to those in similar surveys, non-traceability rates were higher, probably because SPARCLE identified potential participants from population-based case registers which may not have had up-to-date addresses of those who moved after registration. Item non-response was low. The small number of ineligible children and children not selected from population-based registers are unlikely to introduce bias.
To reduce bias due to the sampling strategy, differential non-response, and heterogeneity between regions, all analyses should allow for region as either a fixed or random effect, allow for walking ability and age either by adjusting or by using sampling weights, and evaluate whether further adjustment for socio-demographic characteristics is advisable.
Research is needed to evaluate whether indirect approaches to families result in lower response rates than direct approaches. Studies should routinely report their method of approach to participants and their response rates, to allow meta-analysis of response rates subgrouped by type of approach.
Surveys rarely assess data quality prior to analysis and they should routinely do so.
Publish with Bio Med Central and every scientist can read your work free of charge