Validation of data quality in the Swedish National Register for Breast Cancer

Background The National Breast Cancer Register (NBCR) of Sweden was launched in 2008 and is used for quality assurance, benchmarking, and research. Its three reporting forms encompass Notification, Adjuvant therapy and Follow-up. Target levels are set by national and international guidelines. This national validation assessed data quality of the register. Methods Data recorded through the Notification form were evaluated for completeness, timeliness, comparability and validity. Completeness was assessed by cross-linkage to the Swedish Cancer Register (SCR). Comparability was analyzed by comparing registration routines in NBCR with national and international guidelines. Timeliness was defined as the difference between the earliest date of diagnosis and the reporting date to NBCR. Validity was assessed by re-abstraction of medical chart data for 800 randomly selected patients diagnosed in 2013. Results The completeness of the NBCR was high with a coverage across regions and years (2010–2014) of 99.9%. Of all incident cases reported to the NBCR in 2013 (N = 8654), 98.5% were included within 12 months and differences between health regions were essentially negligible. Coding procedures followed guidelines and were uniformly adhered to. The proportion of missing values was < 5% for most variables and reported information generally had high exact agreement (> 90%). Conclusions Completeness of data, comparability and agreement in the NBCR was high. For clinical quality purposes and benchmarking, improved timeliness is warranted. Assessment of validity has resulted in a thorough review of all variables included in the Notification form with clarifications and revision of selected variables.


Background
The national cancer inquiry in 2005 concluded that cancer care in Sweden, although keeping a high standard, had several inequities both in its structure, its process and in outcome [1]. Regional cancer centers were established through the Association of Local Authorities and Regions, for buildup of national cancer registers [2]. For the most prevalent cancers, regional registers were already established and formed the basis for many outcome studies.
The National Breast Cancer Register (NBCR) has been operating since 2008 and collects data in a national common database. It encompasses the diagnostic and therapeutic processes and outcome for all primary invasive and in situ breast cancer cases. Registration is performed via the web-based INCA platform (Information Network for Cancer care). Quality indicators proposed by the National Board of Health and Welfare mirror the care process. Coding routines follow national and international classification guidelines. Cancer staging and TNM classification followed the AJCC Cancer Staging Manual 7: th edition and the TNM Classification of Malignant Tumours, UICC 7: th edition [3,4].
The register consists of three sections, Notification (including planned adjuvant therapy), Adjuvant therapy and Follow-up. Target levels are set by national and international guidelines. Continuous revisions and updates of the variables makes it a dynamic work tool. The responsibility for reporting lies on the individual health care providers and data are further monitored by the six Regional Cancer Centers located across Sweden's health care regions. The NBCR steering committee has national multi-professional and multi-disciplinary team members and representatives from the breast cancer survivor group. Individuals can actively opt out from registration although this is extremely rare.
Based on register data, the National Board of Health and Welfare has previously published reports for several cancer diagnoses that assess and follow-up on defined quality indicators such as compliance and timeliness [5]. The reports serve as audits, quality assurance and benchmarks. Other stakeholders such as the public, patient representatives, purchasers of health care and decision-makers make use of reported data. Register data also provides a resource for clinical and epidemiologic research.
In 2013, the NBCR steering committee decided to conduct a nationwide validation of the recorded data based on a manual (AKI) developed by the working group for quality registers and INCA [6]. The manual builds upon the validation strategy of cancer registry data proposed by Parkin and Bray [7,8] and includes the following four quality dimensions; timeliness, completeness, comparability and validity. This study presents the results of the nationwide validation of the NBCR and aims to describe how the results have been instrumental for improving the register through revision of the included variables, the reporting forms and its manual, and to assist in training of data managers.

Methods
For evaluation of timeliness, all incident cases reported to the NBCR in 2013 were included (N = 8654), and the difference in time between the earliest date of diagnosis and the reporting date in the registry was calculated.
Completeness was assessed by comparing the cases in the NBCR with registrations in the Swedish Cancer Registry (SCR) [9], to which reporting is mandatory according to the National Board of Health and Welfare's regulations (SOSFS2006:15). Data from the time period 2010-2014 was used. The completeness of the SCR is secured as any diagnosed cancer case is reported by the clinician and from the pathology laboratory after verification of morphological examinations i e biopsies and autopsy. Two publications describe in detail the process [10,11].
Comparability refers to the recording and coding practices and should be clear, nationally uniform and follow international guidelines to enable comparisons between regions and countries. Inclusion criteria are: location (primary breast cancer); sex (women and men); age (all ages); morphology (invasive breast cancer and carcinoma in situ); basis for diagnosis (all cases except diagnosis at autopsy).
Two control functions secure comparability. Firstly, the manual and the report form are unique documents. Secondly, monitoring is performed at the regional cancer centers whereby adherence to inclusion criteria and or any erroneously reported data and or ambiguity will be corrected.
Comparability concerning the workflow was assessed by a questionnaire addressing how different breast units handled reporting routines, involved staff, time allotted, and management support [12].
To assess validity, re-abstracted data from medical records was compared to the reported data via an independent review process. Eight hundred recorded cases between September 2013 and January 2014, were randomly selected using a two-stage cluster sampling plan.
Two hospitals offering breast cancer services (ranked according to size) from each health care region were selected. Within each region (cluster), a subsample of all breast cancer patient records in the 12 selected hospitals were drawn with a probability proportional to the size of region and hospital. The sampling plan was chosen to ensure national representation as well as participation from both large and small breast cancer units.
Re-abstraction of medical records took place in the second part of 2014 and was performed by three specialist nurses with previous experience in register validation and monitoring, henceforth referred to as validators. The re-abstracted information was entered into a specially designed module and subsequently merged with the originally recorded data to calculate exact data agreement. Exact agreement corresponds to the proportion of women for whom the data recorded in the NBCR is the same as in the validation data set. Missing observations were also included in the calculations of exact agreement to account for the plausible situations when 1) data had been reported to the NBCR but could not be found in the medical records, 2) the information was available in the medical records but had not been reported to the NBCR. Strength of agreement was measured by Cohen's Kappa (κ) scores for categorical variables, including 95% confidence intervals (CI), and Pearson correlation coefficients (r) for numerical variables.

Timeliness
Timeliness of reporting showed wide regional variations within 3 months, ranging from 30.2% in the Southeast to 77.4% in the Uppsala-Örebro regions. In 2013, 83.8% of all incident cases had been reported to NBCR within 6 months. At 12 months, 98.5% had been reported (Fig. 1). Differences between the regions were essentially negligible after 1 year (Table 1).

Completeness
The average coverage across all healthcare regions during 2010-2014 was 99.9%. The difference in coverage between regions was small for all calendar years ( Table 2).

Comparability
Eleven of 12 units responded to the questionnaire about the workflow in the reporting process. Nurses and doctors were mainly responsible for reporting to the NBCR, and responses indicated local and regional differences concerning the workflow and routines. Notable is that concerning allotted time resources for registration, five responded, affirmative and six negating. Weather support from the departmental leadership prevailed the responders described a strong formal support but a weak actual support and insufficient resource allocation.

Validity
Of the 800 patients selected for re-abstraction of medical records, one individual had been incorrectly registered and was excluded. The number of missing observations, the exact agreement between the originally recorded and the re-abstracted data is summarized in Table 3.
A detailed summary of each variable included in the validation can be found in the publicly available report (Swedish only) [12].

Lead times
The recorded variables ("Date first contact", "Date first visit to breast unit", "Date of first diagnosis", "Date for care plan", "Date of surgery") were close to complete in the NBCR (≥ 99.9%). There have been historical ambiguities regarding the definition of the variable "Date first contact" which refers to the date of first contact with the specialist clinic/breast unit, but the extracted data in this material correlated highly with the information recorded in the NBCR (r = 0.91) (Fig. 2). Overall, the correlation coefficients were 0.90 or higher for all lead time variables referring to dates prior to start of treatment.
The completeness of the variables "Date of first postoperative histopathological result" and "Date of adjuvant therapy decision" in the NBCR was slightly lower (92.5 and 92.7%, respectively). However, for both variables the correlation with the information recorded in the medical records was high (r > 0.95).
The exact agreement for the variable "Multidisciplinary Treatment Conference" was high (95%), but the κ zero as 100% of the original records that negated that a  multidisciplinary meeting had taken place (n = 17) were classified differently in the re-abstracted data (Fig. 4).
The validity of receptor status variables, ER (Estrogen receptor) status (exact agreement 91%, κ = 0.75, 95% CI: 0.70-0.79), PR (Progesteron receptor) status (exact agreement 85%, κ = 0.72, 95% CI: 0.67-0.76) and KI67 (antigen KI-67) (exact agreement 74%, κ = 0.64, 95% CI: 0.60-0.68) was good when classified as binary variables (No/Yes). Moreover, the correlation between the percentages of immunohistochemical staining, reported to the NBCR and those recorded in the medical records was also strong (ER:r = 0.97, PR:r = 0.94, KI67:r = 0.97). The agreement for the variable HER2-neu (Human Epidermal Growth Factor receptor 2) was, however, weaker (exact agreement 73%, κ = 0.48, 95% CI: 0.43-0.51). HER2 analysis is more time consuming and the results arrive later. Failure to report or to make a distinct notification in the patients file about the test result occurred in 116 cases where the validators recorded that a HER2 analysis had not been carried out despite reported as negative. The validity for "HER2-neu" was heterogeneous across all healthcare regions [12]. Regarding the biologic tumor variables, the proportion in situ cases were overrepresented with respect to missingness (proportion of missing in the NBCR is 30-31% for ER, PR, Ki67 and 34% for HER2-neu).

Discussion
Register data contains an abundance of valuable information useful for improvement of care on a population basis. This validation study showed that data from the Notification form is of high quality and that the validity is generally good. There are to our knowledge few examples  of validation concerning process and outcome data of cancer quality registers [13][14][15]. However, several validation studies are published on national cancer registers [16][17][18][19]. Studies based on cancer registries comparing time from diagnosis to treatment between different cancer forms have also been published [20,21]. An enquiry commissioned by the Ministry of Health and Welfare on, waiting times for diagnosis and treatment derived from three Swedish quality registers found large variations between the different diagnoses breast, colorectal and prostate cancers [22]. Some aspects of the investigated quality dimensions were identified with obvious improvement options. Variables with information considered insignificant were identified. As the registers aim to deliver process data they also serve as a source for clinical research where the demand for detailed information is important. This balance between sufficient and relevant data serving both purposes is challenging. The main findings of each of the four quality dimensions are discussed below.

Timeliness
For quality purposes and benchmarking improved timeliness is warranted. Lead times measuring waiting times need to be readily available. Timeliness has the potential to alert care givers of short comings in the breast cancer process. On the other hand, many indicators need to be analyzed in significant numbers and over time to make sound conclusions. Lag in completed registrations can be explained in part by the complex adjuvant treatment protocols like primary systemic treatment. Previous reports on cancer register validation from Sweden corroborate our results on deficiency in timely reporting [13,14]. The observed wide variations in timeliness is assumed to be a consequence of the work flows that differ within the regions and also mirroring diversities in resource allocation. Those units with more efficient workflows could be useful as point of reference. Registration in real time to the NBCR would be the ideal solution and enhance and secure quality of data entries. Efficiency drops when users report retrospectively from medical charts. The optimum would be communicating systems enabling automatic  transfer of register data from medical records. An interim solution is allocation of more administrative staff for swift reporting to the NBCR.

Completeness
The registry's coverage, when compared against the SCR is high from a national and regional perspective. A slightly lower reporting rate was found in some regions the last year of observation for the period 2010-2014. This reiterates the need for timely reporting, where reported data are displayed online before the monitoring step occurs. The goal is to introduce a structured medical record template for direct transmission of data to the register.

Comparability
Several items concerning surgery with insignificant information were identified; reason for non-surgical primary treatment and indication for reoperation. Revision surgeries related to postoperative complications was deemed insignificant as only major surgery was a variable. Implementation of standard registration according to i.e. Clavien would be preferable [23,24]. As morphologic subtype is no longer basis for treatment these variables were also considered redundant. Information regarding planned adjuvant treatment is reported since 2008 as a surrogate for received treatment.
The histopathological report contained data that should be relevant for treatment decision. During the period of 2008-2015 the completeness of pathology reports increased, with more than 80% being adequately reported today due to synoptic reporting. It has both facilitated reporting from the pathologist and reporting to the NBCR due to succinct definitions of variables. Avoiding ambiguous statements in free text reports have increased quality, accuracy, workflow and thereby timeliness in a positive direction. A correct description of the size of the malignancy was challenging and extent of invasive and in situ tumors, numbers of tumors also showed inconsistency between reported and validated information. A revision of the variables giving fewer options would result in further improved quality. The Quality and  Fig. 3 a Concordance, as measured by the Kappa score and 95% confidence intervals (CI), in validated variables related to diagnostics, stage, planned care and surgical information. b Concordance, as measured by the Kappa score and 95% confidence intervals (CI), in validated variables related to cytopathology and postoperative treatment Standardization Committee in the Swedish Society of Pathology (KVAST) [25] collaborate in the update of guidelines and quality of laboratory analyses. Each individual laboratory should define their own threshold value for receptor status and Ki67 and in the 2015 form "Ki67 status" was recoded according to local cut-off values (Low, High, Not Done, Not available / missing data).
In our experience there are challenges in capturing results of Her2 analysis. To clarify, the results of Her2-neu gene expression when tested with in situ hybridization (ISH) is delayed compared to the rest of the pathology report based on immunohistochemical staining. Our estimation is that when Her2 ISH results arrives later it serves as a basis for therapy recommendation but the care giver has failed in reporting the added information to the register's report form.
Ambiguities in the manual probably explained the poorer results with respect to consistency found in variables related to adjuvant treatment. In the section related to planned treatment the low consistency was most likely due to failure to distinguish between planned and given treatment in the medical records at reabstraction. Revision of equivocal variables was needed.

Validity
The proportion of missing values in the database, INCA among the randomly selected patients was lower than 5% for most variables covered by the Notification form. The reported information had generally high exact agreement (> 90%) and/or κ-score. Surprisingly the variable regarding women diagnosed through the screening program, previously considered unreliable, was found to be highly consistent. Assessment of validity has resulted in a thorough review of all variables included in the Notification Form and the project group have proposed clarifications and certain variables to be removed.

Consequences of the validation study
The following variables were revised: Invasivenessmixed forms with in situ components were excluded; Histopathological size of cancer in situreplaced by Extent; Ki67 status -to be reported according to local cut off levels instead of a uniform national cut off level; Recommended postoperative adjuvant treatmentreplaced by de facto administered therapy reported in the form Adjuvant treatment which matured with acceptable completeness. The following variables were omitted: Reason for no primary surgery; Contralateral risk reducing mastectomy; Reason for completion axillary surgery; Reoperation due to early postoperative complication in breast or axilla; Multifocal canceras the variable Number of invasive tumors in the breast was more reliable; Lymph vascular invasion.

Conclusions
Completeness of data, comparability and agreement in the NBCR was high. The current validation has served to revise and omit insignificant variables in the register. Timeliness in reporting showed long lag times which makes data less useful for clinical purposes. The regional differences found are likely explained by variations in workflow. Timeliness seem to be the main challenge. Concomitantly improvements in delivery of real time data have reinforced the impetus to accelerate reporting. In addition, the structured care plan on reducing waiting times in cancer care initiated and implemented in 2016 has also put pressure to improve the process to reach set targets.