Design and respondent selection of a population-based study on associations between breast cancer screening, lifestyle and quality of life

Background Only few studies have integrated breast cancer screening, lifestyle, and quality of life. Potential bias due to selective non-response may disrupt associations being investigated. We describe the design of a Finnish population-based study on associations between breast cancer screening and various indicators for lifestyle and quality of life, and evaluate the level of bias among the respondents from the first study rounds over 2 years. Methods The study target population of 10 000, 49-year-old women was randomly drawn from the Finnish National Population Registry. The data included birth year, marital status, municipality, and primary language. Data on education were retrieved from Statistics Finland. Questionnaires focusing on lifestyle-related risk factors and quality of life were sent to the target population in 2012–13, 1 year before the first invitation to organized breast cancer screening. We evaluated associations between willingness to respond and demographic characteristics in the eligible study population. Additionally, we examined associations between the demographic characteristics and the Satisfaction With Life Scale (SWLS), and evaluated the impact of non-response using inverse probability weighting and multiple imputation. Results The questionnaire response proportion was 52.4 %. Compared to non-respondents, respondents were more often married, academically educated, and native speakers of Finnish or Swedish. Nevertheless, the estimates of the SWLS among the respondents were in line with those corrected by non-response in the eligible study population. Conclusions Based on the SWLS, the respondents are representative of women in the entire eligible study population.


Background
Breast cancer is the leading cancer and most frequent cause of cancer death among women worldwide [1]. Screening is a major component of secondary breast cancer control. Cumulative evidence from randomised trials and observational studies have demonstrated screening to be effective in reducing breast cancer mortality among women aged 50-69 years [2][3][4][5][6].
Lifestyle is a major modulator of breast cancer risk, and changes in lifestyle have been shown to affect quality of life [7][8][9][10]. Both desirable and harmful lifestyle changes due to participation to screening have been reported from colorectal and lung cancer screening trials [11]. The results suggest that screening may induce desirable lifestyle changes but may also provide false reassurance to continue or to start unhealthy behaviour.
Previous studies on population-based breast cancer screening have concentrated on the screening process (participation, recall rate, false positive and false negative rate) and the outcome (mortality reduction) [12,13]. Some have also investigated psychological distress due to false positive mammograms [14,15]. No studies, so far, have assessed quality of life among the majority of the screened women, i.e. those receiving a normal or a falsenegative screening finding. Furthermore, no studies have examined impacts of breast cancer screening on lifestyle.
In 2012, the Finnish Cancer Registry launched a population-based study to evaluate associations between breast cancer screening, lifestyle, and quality of life among middle-aged Finnish women. We report here the design of the study and assess the overall response, phases of response and the influence of non-response in associations being investigated, using demographic characteristics derived from the Finnish National Population Registry (FNPR) and the Statistics Finland (SF), and a Satisfaction With Life Scale (SWLS) addressed in the study questionnaire.

Methods
The study target population of Finnish women born in 1963 (n = 5000) and in 1964 (n = 5000) was randomly drawn from the FNPR in 2012 and in 2013, respectively, using the year of birth as the only restricting factor. The study material including questionnaires with information letters and informed consent forms were mailed to the target population in 2012 and 2013, one year before the first invitation to organized breast cancer screening at the age of 50 years. The same study material will be mailed 1 year after the first screening invitation to the same women (in 2014 and 2015, respectively).
The study questionnaire focuses on perceived and lifestyle-related risk factors and lifestyle indicators, such as breast cancer in the family, concerns about breast cancer, hormone related factors, hormonal replacement, dietary habits, physical activity, obesity, and smoking. Factors relevant for mammography screening, such as screening experiences, screening outcome, and use of spontaneous breast cancer screening are also addressed.
The study has three phases in the mailing process. In the first and in the third phase, the eligible women (or the so far non-responding) receive all study material. In the second phase, they receive only a reminder letter (Fig. 1). The study participants are those returning both a filled questionnaire and a filled informed consent, others are non-participants.
Demographic characteristics for the whole study target population were obtained from the FNPR and the SF in 2012 and 2013. The FNPR data included year of birth, marital status, primary language, municipality, and data on children (birth year and sex). The SF data included information on education and occupation. Data on attendance, findings (true or false negative, true or false positive), breast cancer diagnoses, and deaths from breast cancer will be derived from the Finnish Mass Screening and Cancer Registries. The FNPR, the SF, the questionnaire, and the registry data can be linked using a social security number, which is unique to every person in Finland.
We calculated overall and phase-specific numbers of respondents and non-respondents, and the response rates among the study target population from the first study rounds over the years 2012 and 2013. For the number and rate of response, target population was followed from the date of the first mailing phase in 2012 and 2013 until January 31st the next year (in 2013 and 2014, respectively). Associations between non-response and the demographic characteristics were analysed using Poisson regression, and are reported by incidence rate ratios (IRR). Associations between the three various  mailing phases and the demographic characteristics were analysed using ordinal logistic regression, and are reported by proportional odds ratios (POR). In each analysis, models without interaction terms were sufficient in describing the data.
The demographic characteristics applied in the analyses were the birth cohort (1963 or 1964), marital status, primary language, education, and university hospital region (based on information on municipality). Marital status was divided into the following categories: married (including also common-law marriages), single, divorced, and widow. Primary language was divided into categories Finnish, Swedish, and other, and education into categories primary (comprehensive education, 0-9 years), secondary (upper secondary general and/or vocational education, 9-12 years), and tertiary (higher and/or academic education, 12+ years). The five university hospital regions Helsinki (HYKS), Kuopio (KYS), Oulu (OYS), Tampere (TAYS) and Turku (TYKS) represented both geographical variation and density of the survey target population; Helsinki as the southern capital area (the most urban), Kuopio as the eastern area (mostly rural), Oulu as the northern area (mostly rural), Tampere as the central area (mostly urban), and Turku as the west-coast area (mostly urban). Those with unknown marital status as well as those living in the islands of Åland were excluded from the analyses due to small number of observations.
Potential bias due to non-response was addressed using the Satisfaction With Life Scale (SWLS) as an example. The SWLS is a five-statement, widely used generic instrument designed to measure cognitive judgements of satisfaction with one's life [16]. The respondents are asked to indicate their agreement with each of the statements using 1 to 7 scale. The final score varies from 5 to 35 with seven categories, where the smallest category (5-9) describes those extremely dissatisfied, and the highest (31-35) those extremely satisfied with their lives.
The SWLS score was first analysed as a function of demographic characteristics among the study respondents using ordinal logistic regression. Thereafter, inverse probability weighting (IPW) and multiple imputation (MI) were employed to find out whether associations between the demographic characteristics and the SWLS score among the study respondents were similar to that of a corrected, complete data set, i.e. a data set with a hypothetical 100 % response rate [17]. In the IPW approach, the complete data set was generated by weighting the observed responses by the inverse of their predicted probabilities of being the observed response. In the MI approach, the SWLS estimates for the non-respondents were generated with a set of 50 imputations from the observed respondent data.
Since relationships between the demographic characteristics and the life satisfaction may vary between the respondents and the non-respondents (e.g. the married respondents may be more or less satisfied with their lives than the married non-respondents), alternative assumptions on the distribution of the SWLS in relation to marital status and education were generated. The observed marginal distribution of the SWLS among the respondents was impaired and improved by 4 % for the non-respondents, thus formulating two new SWLS scores for the corrected, complete data set. Thereafter, these new, overall SWLS scores were compared with the previously formulated scores.
Helsinki and Uusimaa Hospital District Ethics Committee has approved the study design (17.4.2012, 43/13/ 03/00/2012) and National Institute for Health and Welfare has given permission to perform the study and use the data (20.2.2014, THL/1697/5.05.00/2013).

Results
The overall response rate in 2012-2013 was 52.4 % in the entire target population (n = 10 000), and 53.0 % in the eligible study population (n = 9894). The nonreachable members of the target population (n = 106) were not included in the eligible study population. These were women, who refused to participate, did not return the informed consent, could not be reached by mail, or had died during the study period.
The response rates after the first mailing phase were 27.3 % in the birth cohort 1963, and 29.1 % in the birth cohort 1964. The corresponding percentages after the second phase were 41.5 and 45.6 %, and after the third phase 51.8 and 53.0 %, respectively ( Table 1).
The distribution of demographic characteristics among the respondents, the non-respondents, and among the eligible study population is presented in Table 2. Compared to the non-respondents, the respondents were more often married, highly educated, and native speakers of Finnish or Swedish. The geographical distribution as well as the distribution by the birth cohort was similar among the respondents and the nonrespondents. There were differences in the accumulation of respondents between the two birth cohorts over the three mailing phases in 2012 and 2013 (POR 1.08, 95 % CI 1.00-1.17) (Table 3). Nevertheless, the overall number of respondents as well as the distribution of demographic characteristics was similar in both birth cohorts (IRR 1.02, 95 % CI 0.97-1.08).
Most of the study respondents were satisfied to their lives (Fig. 2). Associations between the demographic characteristics and the SWLS among the survey respondents and in the two corrected data sets are presented in Table 4 by PORs and 95 % CIs. The results show that the PORs of life satisfaction within each demographic category are similar among the study respondents and among both of the completed data sets. The reference categories "married", "tertiary education", "Helsinki region (HYKS)", "the birth cohort 1963", and "Finnish language" represent those most satisfied.
Improving or impairing the life satisfaction (SWLS score) in relation to marital status or education among the non-respondents did not change associations between the SWLS and the demographic characteristics among the respondents and the corrected data sets (data not shown).

Discussion
We present a design of a Finnish study, which evaluates impacts of breast cancer screening on self-reported lifestyle and quality of life. We also report response rates, analyse the distribution of demographic characteristics over the respondents and non-respondents from the first, two study rounds, and evaluate the impact of nonresponse on the Satisfaction With Life Scale (SWLS) as an example.
The Finnish study is conducted during the years 2012-2015 among 10 000, randomly selected women born in 1963 and 1964 by sending them a questionnaire 1 year before and 1 year after their first invitation to organised breast cancer screening. After 2015, lifestyle and quality of life among the study respondents will be examined in relation to screening participation and results.
The first two rounds of the study were carried out in 2012 and 2013. During these years, the overall response rate among the target population was 52.4 %. Modest response rates have been reported also from other European studies [18][19][20][21]. Empirical assessments over the past decade have, however, shown that response rates may not be as strongly associated with the quality or representativeness of the study as has been believed. Even non-direct relationships between the response rate and the non-response bias have been reported [22]. It thus seems that the degree to which sampled respondents differ from the eligible survey population as a whole is central to evaluate the representativeness. Therefore, a study with a relatively high response rate may produce more biased results than a study with a lower response rate from a truly random and representative group of respondents [23][24][25].
Our study included three mailing phases within both rounds in 2012-2013. During the first and the third    phase, the eligible population (or the so far nonrespondents) received a questionnaire with an information letter and an informed consent. During the second phase, the eligible population received only a reminder. The additional mailing phases increased the overall response rate from 28.2 to 52.4 %. The accumulation of data differed between the study years (and birth cohorts). This did not, however, affect the distribution of demographic characteristics between the respondents and non-respondents. Additional contacts with reminders have been successful in increasing the sample size also in other studies [26]. Nevertheless, criticism has been given on inflating costs due to re-contacts as well as on the low impact of re-contacts on the response rate and data quality especially after the second contact [27].
Compared to the non-respondents, the respondents of our study were more often married, highly educated, and native speakers. This is in line with several previous studies, which have reported elderly, married, and educated women to be the most frequent respondents in health care studies [19,21,28,29]. Despite these differences, the addressed quality of life estimate (the Satisfaction With Life Scale, SWLS) was similar among the respondents and in the complemented data sets constructed by the inverse probability weighting (IPW) and multiple imputation (MI). Moreover, improving or impairing the SWLS score in relation to marital status or education among the non-respondents did not change the overall distribution of the SWLS in the corrected data.
Adopting a comprehensive strategy to investigate missing data early in the research process gives researchers information necessary to evaluate key assumptions. Both the IPW and the MI methods are widely used to assess or improve the accuracy of results in various study designs [17]. In Finland, the IPW method has previously been applied e.g. to improve accuracy of results of a population survey using sociodemographic register data covering the whole study sample [30]. In the United States, the IPW and the MI methods have been utilized also to examine internal validity of estimates derived from longitudinal studies [31,32]. Table 4 Proportional odds ratios (PORs) with 95 % confidence intervals of the SWLS for various data. These data consist of the respondents and two corrected, complete data sets. The complete data sets were generated from the respondent data using inverse probability weighting (data a ) and multiple imputation (data b )