Data origin and description
The data presented came from three population-based cancer registries of the Francim network (Doubs, Loire-Atlantique, and Tarn). The study has been described in detail elsewhere [11]. Briefly, all women over 18 years old residing in the area covered by the registries and newly diagnosed with a primary BC in 2007 were included. Men, women with known antecedent of BC, lymphomas or sarcomas were not included. Women with no information regarding stage at diagnosis and whose residence address was not known were excluded. Data regarding demographical and clinical characteristics were collected directly by the cancer registries from clinical records in care centres. A self-administered questionnaire was sent to women to collect health behaviours, care trajectory before the first hospitalization and their individual socioeconomic position (SEP). Women were classified as respondents if they fully completed the self-administered questionnaire and as non-respondents if they did not complete the questionnaire or if they had a question on which no data regarding individual SEP were available. The study was approved in 2008 by national ethical committees (CNIL (n°907,172) and CCTIRS (n°08075)).
Stage at diagnosis
It was defined in accordance with the 5th TNM (Tumor, Nodes, Metastasis) classification for malignant tumours [12]. We defined two groups regarding their initial prognosis: early stage BCs (Tis/T1 N0 M0) and advanced stage BCs (T2/T3/T4 N0 M0, or N+, or M+).
Women’s socioeconomic position
Individual-level SEP measure
Three SEP variables were used to approach the individual social deprivation: the level of education (lower than high school, high school or higher), occupation (manual, non-manual), and income (tax exemption, or not). We chose to create a multidimensional individual index to capture broader dimensions of individual deprivation. Each SEP variable was scored 1 in case of lower socioeconomic environment (education lower than high school, manual worker, and tax exemption) or 0 otherwise. Those three variables were then summed to compute an individual-level deprivation index (IDI) ranged from 0 (least deprived) to 3 (most deprived).
Area-level SEP measure
Women’s address at diagnosis was collected and used to obtain the corresponding smallest geographical unit available at the time of the study in France for the statistical information ((IRIS) “Îlots Regroupés pour l’Information Statistique”) by geolocation. An IRIS represents in average 2000 inhabitants relatively homogenous in terms of socioeconomic characteristics. We used the 2007 French version of the European deprivation index (EDI) which is available at the IRIS level. It has been developed with the ambition of being transposed to other European countries [13]. To date EDI exists for France, Italy, Spain, Portugal and England [14]. Each IRIS was assigned an EDI value, calculated from the 2007 census data. A high value means an IRIS with a high level of deprivation. Quintiles from the national distribution of EDI were computed from all the IRIS of metropolitan France (excluding overseas regions), from quintile 1 corresponding to the least deprived to quintile 5 corresponding to the most deprived IRIS.
Individual-level SEP measure with missing data imputed
To impute missing values in IDI, we used the multiple imputations using chained equation procedure in STATA to generate 20 sets of plausible values of IDI based on an imputation model depending on EDI and covariates [15, 16]. The imputation model was constructed using variables assumed to be associated with IDI: age, health insurance regimens (regimens and supplementary universal healthcare coverage allocated under an income threshold giving entitlement, for patients with lower income), tumour characteristics (human epidermal growth factor receptor 2 (HER2), oestrogen and progesterone receptor status, Scarff-Bloom-Richardson (SBR) grade),living place (geographical area, rural/urban classification, and distance to the nearest gynaecologist), and EDI that has been showed as a relatively good predictor of individual deprivation [17]. According to Sterne et al. who recommend to include the outcome in the imputation model, stage at diagnosis was introduced in the imputation models [18]. It results in the imputed IDI (i-IDI) coded in the same way as IDI.
Statistical analysis
We tested first (using Chi-square tests), the selection bias due to the exclusion of women with incomplete data on IDI. Indeed, compared to respondents, non-respondents might have more often poor SEP and advanced stage at diagnosis. Then we tested for the association between SEP and advanced stage at diagnosis under different configurations: i) Firstly we compared the results obtained using either IDI or EDI among women with complete data on SEP (model 1: complete case analysis). We assumed that due to the aggregated nature of EDI, the results using this index are exposed to measurement errors leading to a lack of accuracy and a loss of power in the assessment of individual SEP-related differences in stage at diagnosis; ii) Then we studied the association between SEP and advanced stage at diagnosis using EDI among women for whom individual measures of SEP were missing (model 2: missing case analysis) in order to approach what may be the effect on results if these women had been included in the analyses 1; iii) Next, we studied the association between SEP and advanced stage at diagnosis using i-IDI among respondents and non-respondents (model 3: imputed case analysis). We assumed that in this configuration the selection bias observed in complete case analysis would be downward as non-respondent women would be included. By increasing sample size compared to complete case analysis, we expected to observe stronger association between i-IDI and stage at diagnosis than what is observed using IDI; iv) And finally, we studied the association between SEP and advanced stage at diagnosis using EDI among women with data on either IDI or EDI (model 4: proxy measure analysis). As in imputed case analysis, we assumed that the selection bias observed when excluding women with missing individual SEP would be downward in this configuration. However, we assumed that the observed associations would be attenuated compared to results using i-IDI due to the loss of accuracy and power deriving from the use of an ecological index to approach the individual level. We focused on the crude association between SEP and stage at diagnosis and thus, we did not adjust our models for confounders including age. In sensitivity analyses, the models were adjusted for age. We also tested the influence of excluding EDI from the imputation model on the model 3 results to test for the contribution of EDI in the imputation. For each analysis, we used logit models with advanced stage at diagnosis as outcome. We used area under ROC curve (AUC) to compared models using different SEP measures in terms of classification errors. This was done using the test based on Delong’s et al. algorithms [19]. We used 0.2 and 0.05 as statistical significance levels for respectively the comparison between respondents and non-respondents and regression analyses. Statistical analyses were performed with STATA software version 14 (StataCorp LP, College Station, TX, USA).