Data quality of drug-resistant tuberculosis and antiretroviral therapy electronic registers in South Africa

Background To assess the quality and completeness of treatment and outcome data in the electronic tuberculosis (TB) and antiretroviral treatment (ART) registers in drug-resistant (DR-) TB patients at three treatment facilities in South Africa. Methods We did a retrospective cohort study using routinely-collected data from DR-TB registers of rifampicin resistant adults (≥18 years old), on ART, initiating DR-TB treatment between January 2012 and December 2013. We linked patient information from the DR-TB register to the ART register using patient identifiers and an algorithm based on string edit distance and date of birth. We describe data gaps and discrepancies found. Results Overall, 2852 DR-TB patients met our inclusion criteria based on the DR-TB register data, and of these, 1685 (59%) could be matched to the ART registers. An additional 253 patients from the DR-TB registers were found in the ART registers, having initiated ART, despite the DR-TB register indicating that they were not on ART (or this data was missing). 11% of matched patients did not have TB treatment status recorded in the ART register despite being recorded as being on TB treatment in the DR-TB register, and 78% did not have an ART start date recorded in DR-TB register despite being on ART treatment as per the ART register. 11% of matched patients had a death recorded in one register but not the other, and of those with death recorded in both, 15% of dates differed by > 1 month. Conclusions The underreporting of death and the lack of ART or TB status in the electronic DR-TB and ART registers could negatively impact monitoring efforts by downplaying the state of the TB/HIV epidemic. Improved recording of these data sources, and data integration across systems, could improve the accuracy of reporting for the national HIV/ART and TB programs.

the public healthcare system. While health facilities keep their own paper records and registers, they are required to report into the national electronic registers for monitoring of HIV and TB. Two TB registers are widely in use: the national electronic TB register (ETR.-Net) data for drug-sensitive TB and the electronic drug-resistant TB register (EDRWeb) for drug-resistant (DR)-TB. The Three Integrated Electronic Registers (TIER.NET) is a national monitoring/evaluation software system used to capture standardized data of patients on ART [3]. Introduced in 2011, it is used in over 3000 facilities in South Africa.
Effective monitoring of the national HIV and TB programmes relies on complete, accurate data. Knowing drug-resistant TB incidence within the HIV programme is critical. Furthermore, HIV infection rates and ART uptake amongst patients with TB are key indicators of TB and HIV programme integration. Accurate recording and reporting is crucial for programme monitoring, resource allocation, policy making and epidemiological analysis.
Though there was an evaluation of the implementation of TIER.NET [4] there have been no studies which assessed quality and completeness. A few studies have assessed the quality and completeness of the ETR. Net and EDRWeb [5][6][7][8]. A study of pediatric TB cases demonstrated that a third of these are never recorded in the ETR. Net [5], and another study found that in EDRWeb, 12% have missing HIV status and 4% have incorrect HIV status [8]. Although 90% of adult patients with TB are recorded in the ETR. Net [7], estimates for accurate reporting of HIV status range between 53.9 and 79% [6,7] and reporting of ART status is as low as 24% [7]. In a study comparing ETR. Net to clinic source documents, agreement of ART status and TB treatment outcome was low at 59 and 47%, respectively [6]. A study comparing ETR. Net data to the National Health Laboratory Service data found 20% of smear positive TB cases were not registered in the ETR. Net and only 69.9% had patient records at the facilities from where the sample was sent [9].
We linked routinely-collected data from DR-TB and ART registers using patient identifiers (name, surname, date of birth, national identification number, if available). This paper aims to describe the methods used to match patients across routine data platforms, and data gaps and discrepancies found.

Data sources
In 2016/17 we extracted EDRWeb and TIER.NET data for two South African healthcare facilities. In addition to EDRWeb and TIER.NET, we included two additional sources of TB and HIV data, as these systems were used at a third healthcare facility we evaluated: the Focal Point Information System (FIS) is an electronic TB patient record system, allowing for multi-point data entry and is being used at three DR-TB treatment sites in South Africa. TherapyEdge-HIV™ (TE) is an electronic medical database system designed for HIV patient management currently used at six sites in South Africa. FIS and TE are more comprehensive than EDRWeb and TIER.NET in terms of variables collected. These systems were implemented at various time points at each site ranging from 2004 to 2013; at the time of data collection all the sites had a minimum of 3 years' experience with their respective systems. In most instances data entry occurred from dedicated data staff at the site, but could also have been entered by other administrative or clinical staff. Some systems were donor funded and therfore had external support to implement and manage them (i.e. TE and FIS). TIER.NET and EDR-Web are government-mandated systems and implementation and management is through the South African National Department of Health (NDOH). To our knowledge there was no external validation or oversight of these electronic registers.
As this study could not have been done without knowing patient identifiers, a number of steps were taken to mitigate risks to patient confidentiality: 1) datasets were collected directly from each facility, password protected and transported on an encrypted hard-drive, 2) access to the data was limited to a single person who performed all analyses, 3) analytic datasets beyond the matching algorithm were de-identified for subsequent analyses.

Study sites
The three facilities were based in the Gauteng and KwaZulu-Natal provinces. One is an outpatient Non-Governmental Organization (NGO)-supported DR-TB clinic at a public academic hospital. Another facility is a MDR-TB hospital providing inpatient and outpatient care. The third facility is an outpatient DR-TB centre. To maintain anonymity, we labelled them Site 1, 2 and 3. Site 1 uses the FIS and TE systems, while sites 2 and 3 use EDRWeb and TIER.NET.
Inclusion criteria for the study were: adults (aged 18+ years), laboratory diagnosis of rifampicin (RIF) resistant TB and initiated second-line TB treatment between January 2012-December 2013. HIV and ART status was primarily determined from either DR-TB or ART register. In both EDRWeb and FIS, ART status was recorded as a binary yes/no variable for ART started or as an ART start date, and either was used to identify ART status. Then, since the DR-TB registers may have incomplete HIV status and/or ART initiation information, we matched DR-TB patients with negative/missing HIV status, or HIV positive and not on ART (or whose ART status was unknown/missing) in the DR-TB register, to the ART register, in order to confirm HIV and/or ART status. Once HIV and ART status had been confirmed, subsequent analyses were restricted to patients who were HIV positive and on ART.

Matching algorithm
We matched patients from the DR-TB register to their ART record by first merging those where the first letter of either the name or surname matched, and then using a generalized Levenshtein edit distance [10] to match names and surnames between the two datasets. The edit distance is a measure of similarity between two strings. Second or third names were also considered, if recorded. SAS version 9.3 was used for analysis, and the built-in function, COMPGED [11] was used.
Patients were marked as a match if their name in the one dataset matched their name in the other dataset with an edit distance of 100 or less [12,13], and their birthdate matched. For patients with a missing date of birth, we matched on patient age and gender, alongside the names, allowing for a 1-year difference. For a birthdate match, we allowed for the year to shift up or down by 1 year, and we allowed for the month of birth to shift up or down by 1 month. We also allowed for the birthdate day and month to be erroneously switched in the datasets. Patients were considered a match if their South African identification number was an exact match. Then, in the case of an exact match of the surname and date of births between the two datasets, we were more lenient on the edit distance for the names (also for first, second or third name matches) by allowing for a score of 500 or less.

Validation of matching
A random sample of 10% of each healthcare facility's DR-TB patients meeting our inclusion criteria was selected. A manual matching process, considered the 'gold standard', was followed looking for patients in the corresponding ART register. We calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV).

Statistical analysis
We compared demographics, TB characteristics and outcomes between those patients we could match and those we could not. We describe completeness and discrepancies of common variables in both ART and DR-TB registers. For reporting of death outcomes and ART start dates we calculated a Kappa statistic to measure agreement between the two data sources. The severity of date discrepancies was assessed by calculating the proportion of dates differing by > 1 month.

Results
Overall, 2852 DR-TB adult patients, HIV+ on ART, initiated TB treatment between January 2012-December 2013 based on the DR-TB register data, and of these, 1685 (59%) could be matched to the ART registers. Of the 1526 patients who were reportedly not on ART (or the data was missing), 253 patients were found in the ART registers, having initiated ART. This brought our total matched up to 1938 (62%) and a total of 3105 DR-TB patients who were included in the subsequent analyses. While the rate of matching was high in Sites 1 and 2, with 76 and 80%, respectively, we could only match 59% of patients at Site 3. There were no substantial differences in age, gender, new treatment status, extrapulmonary TB or TB outcome between those matched compared to those we couldn't match to the ART register (Table 1), though the number of characteristics we could assess was limited.
In order to validate our matching algorithm, we randomly sampled 171 patients and compared the algorithm to our "gold standard" of manual matching. We found our algorithm to have a sensitivity of 87%, a specificity of 99%, a PPV of 98% and a NPV of 94%. The specificity was reduced by 1% and PPV by 2% due to an error in misclassification for one patient in the manual matching process.
Discrepancies between ART and DR-TB registers are presented in Table 2. Overall, 8.1% of patients had a discrepant date of birth, however the matching algorithm was more lenient on these if the name and surname were matched already; another 2.5% of patients had a date of birth missing in either register so these patients would have been matched on age, alongside gender and names. All patients in Site 1 had their HIV status recorded as HIV positive in the DR-TB register, but 6.2 and 4.5% of patients in Sites 2 and 3, respectively, had either a negative, unknown or missing HIV status in the DR-TB register, despite being initiated on ART. Overall, 8.7% of patients had no record of TB treatment (e.g. missing) in the ART register data and 2.5% of patients were recorded as not being on TB treatment (e.g. TB treatment incorrectly recorded as "No") in the ART register record. Of the 217 patients whose TB treatment status was incorrectly recorded in the ART register, 82% had initiated ART prior to TB treatment and thus this is likely due to the records not being updated. However, of all patients who had initiated ART prior to TB treatment (61%, n = 1187), majority (85%) had TB treatment status recorded correctly, demonstrating that most records had been updated in real time.
Majority (87%) had ART status correctly recorded in the DR-TB register. While ART start date was reported in both ART and DR-TB registers for most patients at Site 1 (81.2%), 67.8% were discrepant, and three quarters of those discrepant differed by > 1 month. At Site 1, 12% of patients had a missing ART start date in the ART register, though the data indicated they initiated treatment. Most patients at Sites 2 and 3 did not have ART start date recorded in the DR-TB register (99.7 and 80.5%, respectively). Of the 280 patients at Site 3 that did have ART start date recorded in both registers, 15.4% had discrepant dates with 68.7% of ART start dates differing by > 1 month. Overall there was poor agreement of the reporting of the ART start date between the ART and DR-TB registers (Kappa = 0.001). Sites 2 and 3 did not have ART regimen information recorded in the DR-TB register so we could not compare between the two datasets. At Site 1, where ART regimen was recorded in both DR-TB and ART registers for 44.3% of patients, it was in agreement for majority of them (86.4%). However, 11.4% of patients at Site 1 did not have ART regimen recorded in the ART register.
Overall, 7.3% of patients had a death reported in the DR-TB register but not the ART register. Of these 80.9% were reported in the ART register as a lost to follow-up, 14.9% as transferred out, and 2.8% of patients had no outcome reported. While these may be explained by the timing of when the information gets recorded, a large proportion of these deaths (45.4%) were reported as deaths in the DR-TB register less than 3 months after the non-death outcome in the ART register and 36.9% had dates which occurred prior to the non-death outcome date in the ART register.
Overall, 3.5% had a death reported in the ART register but not the DR-TB register. Of these, 56.7% were reported as lost to follow-up in the DR-TB register, 16.4% had no outcome reported, and one patient was reportedly transferred out. The remaining patients had TB treatment-specific outcomes reported: 11.9% cured, 9.0% completed treatment, 3.0% still on treatment and one patient had failed TB treatment. Of the 67 patients with death reported in the ART register and not the DR-TB register, 29.9% were reported as deaths in the ART register < 3 months after the non-death outcome in the DR-TB register and 34.3% had their death dates prior to the non-death outcome date in the DR-TB register. Overall  Still on treatment 0 0 2 2 agreement between reporting of death was good between the DR-TB and ART registers (Kappa = 0.60).

Discussion
The electronic DR-TB and ART registers, used by majority of public healthcare facilities across South Africa, are crucial for monitoring treatment programme outcomes and have improved the quality of the reporting to the South African NDOH and international organizations such as the World Health Organization. They have also allowed for data access by various research organizations. However, they do not come without their weaknesses. Our analysis has shown an underreporting of crucial indices of patient outcomes, in particular ART start dates and death outcomes. Close to 11% of patients had a death reported in the DR-TB register but not the ART register, or vice versa. Majority of the corresponding non-death outcomes were reported as lost to follow-up, thus we can assume the patient was marked as such when they failed to return to the facility. Though these are likely explained by the design or timing of data entry in either system, it still remains problematic to researchers and government who analyze these individual data systems at face value for monitoring purposes; additional follow-up is required to ensure that death does not go underreported.
A possible solution to confirming death outcomes would be to have the electronic registers linked to the South African national death registry, however this will depend on having a valid South African identity number correctly recorded, and with the large patient numbers involved, it would need to be an automated and continuous procedure. We also found poor quality of recording of ART information in EDRWeb. The majority of patients at the two study sites that used EDRWeb exclusively did not have ART start date recorded (99.7 and 80.5%, respectively). Though FIS also had missing ART information, it was on a smaller scale compared to EDRWeb. ART start dates and ART regimen are important to know for monitoring of the DR-TB programme, as well as monitoring drug interactions as new drugs are being introduced, for example dolutegravir for ART and bedaquiline for TB. Rates of DR-TB diagnoses in the ART programme are also underreported (as with Site 2 where just under half of patients did not have DR-TB status recorded in the ART register). Importantly, these discrepancies existed across sites and different systems, which may suggest that this could be a problem for other similar healthcare facilities across South Africa. Our matching algorithm performed reasonably well in linking patients between datasets, however it is limited  (3) 0.5% (7) 0.7% (13) Transferred out in ART register, death recorded in DR-TB register within 3 months 0.7% (1) 0.3% (1) 0.4% (5) 0.4% (7) Transferred out in DR-TB register, death recorded in ART register within 3 months 0.0% (0) 0.0% (0) 0.0% (0) 0.0% (0) Abbreviations: DR-TB Drug resistant tuberculosis, ART Antiretroviral therapy, FIS Focal Point Information System, TE TherapyEdge-HIV™, EDRWeb Electronic Drug Register Web, TIER.NET The Three Integrated Electronic Registers by the quality of the reporting of patient identifiers, and the sensitivity of the algorithm could be improved if the patient identifiers were accurately recorded in both registers. This highlights the need for a unique patient identifier within the public health system, something planned for in the National Health Insurance Policy proposed by the South African government [14].

Conclusions
The quality and completeness of the electronic DR-TB and ART registers is concerning for researchers and government who use it to monitor the outcomes of the South African TB and HIV programmes. Although South Africa's public healthcare system is understaffed and overburdened and regular internal auditing of data may not be feasible, urgent attention is needed to address data gaps in the ART and DR-TB electronic systems.
In 2016, the NDOH implemented a TB module into TIER.NET aiming to integrate the TB and HIV data to increase reporting efficiency. While this TB module can capture DR-TB through drug sensitivity and TB regimen variables, EDRWeb will still be used to monitor DR-TB patients. The impact of the TIER.NET TB module is yet to be determined, and while data integration can improve the quality of these data systems, accurate reporting is still crucial to ensure good data quality and their ability to have a meaningful impact on monitoring, policy, planning and epidemiological analysis. Furthermore, it is essential to have intensive, ongoing training of staff who capture data into electronic registers, constant emphasis on data quality, improvement of patient file management, and the transfer of information to the data capturing team.