Using data from patient interactions in primary care for population level chronic disease surveillance: The Sentinel Practices Data Sourcing (SPDS) project

Background Population health planning within a health district requires current information on health profiles of the target population. Information obtained during primary care interactions may provide a valuable surveillance system for chronic disease burden. The Sentinel Practices Data Sourcing project aimed to establish a sentinel site surveillance system to obtain a region-specific estimate of the prevalence of chronic diseases and mental health disorders within the Illawarra-Shoalhaven region of New South Wales, Australia. Methods In September 2013, de-identified information for all patient interactions within the preceding 24 months was extracted and collated using a computerised chronic disease management program that has been designed for desktop application (Pen Computer Systems Clinical Audit Tool: ™ (PCS CAT)). Collated patient data included information on all diagnosed pathologies and mental health indicators, clinical variables such as anthropometric measures, and patient demographic variables such as age, sex, geographical location of residence and indigenous status. Age-standardised prevalence of selected health conditions was calculated. Results Of the 52 general practices within the 6 major Statistical Local Areas (SLAs) of the health district that met the inclusion criteria, 17 consented to participate in the study, yielding data on n = 152,767 patients, and representing 39.7% of the regional population. Higher than national average estimates were found for the age-adjusted prevalence of chronic diseases such as obesity/overweight (65.9% vs 63.4%), hypertension (11.9% vs 10.4%) and anxiety disorders (5.0% vs 3.8%), but a lower than national average age-adjusted prevalence of asthma (8.0% vs 10.2%) was also identified. Conclusions This proof-of-concept study has demonstrated that the scope of data collected during patient visits to their general practitioners (GPs), facilitated through the Medicare-funded primary health care system in Australia, provides an opportunity for monitoring of chronic disease prevalence and its associated risk factors at the local level. Selection of sentinel sites that are representative of the population being served will facilitate an accurate and region-specific system for the purpose of population health planning at the primary care level.


Background
Valid data on morbidity, at the regional level, is essential for planning of primary healthcare services that are specifically tailored to the needs, demands and requirements of the local population. The currently available peer reviewed literature indicates multiple avenues of population health surveillance including national surveys, administrative data and electronic health records. National surveys are surveillance methods that have commonly been used [1,2], but that demonstrate multiple shortcomings. Prevalence estimates obtained from national and state level surveys are based on self-reported measures, provided by respondents of these surveys, the accuracy of which is questionable [2]. Secondly, surveys do not include all aspects of health, and therefore do not provide a full and accurate picture of health [3], resulting in a lack of generalisability to smaller regional populations [4]. Additionally, different subgroups of the population may demonstrate different response rates to surveys, which impacts on the generalisability of the survey data to the general population [5,6]. Surveys, however, provide prevalence rates comparable to routinely collected clinical administrative data [1][2][3], indicating that clinical administrative data is a potential avenue for surveillance. The use of information obtained from administrative data, including primary care medical records (data collected during general practitioner (GP) visits), physician billing, specialist visits, pharmacy data (prescription dispensation) and hospital data (inpatient/ outpatient information); has been shown to provide reliable and valid prevalence estimates of chronic disease conditions [1][2][3]7,8].
Clinical administrative data is widely available [1] and its collection and reporting systems are currently in place in both primary care and tertiary levels of care [9]. Further, administrative data is validated by the clinical judgment of medical practitioners and may be generalisable to smaller/regional populations [10]. Patient data that is entered into electronic medical recording software at the point of contact with primary health care practitioners is often supported by diagnostic testing and clinical examination, and is thus likely to be more valid than self-reported health information [10]. The peer-reviewed literature hence provides a vast amount of current evidence on the effectiveness of utilising administrative and/or primary care data for population health surveillance; however, there exists a current lack of such data based disease monitoring models in Australia. The Sentinel Practices Data Sourcing (SPDS) project aimed to implement a sentinel site surveillance system within the Illawarra-Shoalhaven region of the state of New South Wales (NSW) in Australia to obtain a regionspecific prevalence of chronic diseases and mental health disorders through the use of patient data obtained during primary care patient interactions. A pre-tested method of data extraction [10] was used, aimed at informing the population health planning within health service catchments of regional Australia.

Methods
The study was conducted in the Illawarra-Shoalhaven region of the state of NSW in Australia ( Figure 1). Within NSW there are 15 Local Health Districts which are responsible for the acute, sub-acute and tertiary care service delivery in the state through the public hospital system; and 17 Medicare Locals which are responsible for the primary healthcare planning and delivery for their constituent regions [10]. Unlike other regions, the geographical catchment boundary of the Illawarra-Shoalhaven Local Health District (ISLHD) is the same as that covered by a single Medicare Local namely the Illawarra-Shoalhaven Medicare Local (ISML); which places the region in a unique and advantageous position in terms of planning and implementing a chronic disease surveillance system. Additionally the region has a diverse socio-economic profile and has pockets of both higher and lower socioeconomic disadvantage, comparing the Index of Relative Socio-Economic Disadvantage (IRSD) scores between the region and for Australia as a whole ( Figure 2). IRSD is a composite summary measure constructed by the Australian Bureau of Statistics (ABS) for all regions in Australia and is a based on income, educational attainment, employment status, occupation type, family structure, dwellings, house ownership, marital status and ethnicity [11].
The study undertook secondary analysis of administrative data through extraction of de-identified clinical patient information and the project was rolled out in 4 phases: −

Phase 1: Practice recruitment
The study aimed to recruit 12-18 practices within the Illawarra-Shoalhaven region based on the Statistical Local Area (SLA) geography and the demographic profile of the catchment. Eligible practices were identified by requesting the Illawarra-Shoalhaven Medicare Local (ISML) for a list of the region's practices that fulfilled the following inclusion criteria (n = 52): ✓ location in one of the 7 SLAs that represent the Illawarra-Shoalhaven region; ✓ multiple (more than 1) GPs working at the practice site (solo practitioner sites are likely to have smaller patient numbers); ✓ employment of either more than one full-time GP or more than two part-time GPs (i.e. who work for at least 20 hours a week); ✓ additional criteria: − installation of a clinical auditing software package on desktop software or a desire to install and use the Pen Computer Systems (PCS) Clinical Audit Tool: ™(PCS CAT) (multiple licensing for PCS CAT has been procured by ISML and is therefore freely available to all general practices within its catchment); and a willingness to provide de-identified practice data extracts to the researchers for surveillance purposes.
Seventeen general practices in the catchment volunteered and consented to participate in the study (response rate = 33%). Only two electronic medical record software packages were being used by participating practices in the study, either Medical Director™ (n = 8) or Best Practice™ (n = 9).

Phase 2: Data cleansing and enhancement of data accuracy
Recruited practices undertook comprehensive "Data Cleansing" training to understand the usage and the various functionalities of PCS CAT, and to update and clean the data stored in their clinical systems. With the consent of the primary GPs and managers within each practice, the "Data Cleansing" training was conducted by the researchers. The cleaning of practice records improves searches (in both the practice electronic medical record software program and the PCS CAT), to identify patients with particular conditions and thus to target health research and patient management activities. This data cleaning process allowed the complete patient database that had been entered during GP consultations to be identified when searching for specific variables. The data cleansing phase of the study was conducted using  the data maintenance utility tools which are available within both the GP electronic medical record software programs used in the study. Data cleansing included: − encouraging all practice staff to use the 'drop down box functionality' of their clinical software to define and code all medical diagnoses and other sections of the patient record; strictly avoiding free text entries in all sections of the patient record; finding all identifiable free text non-coded past medical history items, and either linking them to appropriate coded items or replacing them with the correct coded item; and coding all inactive patients as 'Inactive' (an 'active patient' is one who has attended the practice three or more times in the past two years as defined in the Royal Australian College of General Practice: Standards for general practices [12]).

Phase 3: Data collection
Patient data that had been de-identified by practice employees was extracted to a database. Data items extracted from general practice clinical systems included: demography (population by age and sex and population geography including postcodes and suburbs), chronic disease surveillance items (hypertension, type 2 diabetes mellitus, depression, anxiety, COPD, asthma, congestive heart disease, stroke, osteoarthritis, osteoporosis, high Body Mass Index (BMI)overweight and obese), and Medicare Benefits Schedule (MBS) items uptake relevant to primary care services for GP and other non-referred attendances. A cleaned, de-identified PCS CAT data extract was performed in September 2013 for all recruited practices which included all information obtained from patient interactions in the preceding 24 months for all diagnosed pathologies, clinical variables such as anthropometric measures, and patient demographic information such as age, sex, geographical location of residence (postcodes and suburbs) and indigenous status.

Phase 4: Data collation and analysis
The research team collated all extracted data, crossmatched residential suburb and postcode information with health and clinical information using de-identified unique link ID tags, converted all resultant information into usable database formats, and then analysed the datasets using Microsoft Excel (V2007: Microsoft Corporation, Redmond Washington, USA). The final datasets hence included clinical diagnosis and patient demographic information as entered by GPs within each participating practice. Basic epidemiological measures, including agespecific prevalence and total prevalence were calculated for all major disease conditions. The prevalence figures were compared against comparable indicators reported for the same age groups by the Australian Health Survey (AHS) 2011-12 conducted by the ABS [13].
The age-specific disease prevalence figures obtained from the study sample and the estimated national prevalence figures reported by the AHS 2011-12 were then agestandardised using the 2011 estimated resident population of Australia [14]. Comparisons across age-standardised prevalence were conducted for all major chronic conditions that the SPDS project is targeting for regular surveillance namely; obesity, overweight, diabetes mellitus, hypertension, asthma, mental health disorders such as clinically diagnosed depression and anxiety disorder, coronary heart disease, stroke, and chronic bone diseases such as osteoarthritis and osteoporosis. Both Microsoft Excel (V2007: Microsoft Corporation, Redmond Washington, USA) and the PCS CAT tool (v.3.1: pencs.com.au) were used for graphical illustration of demographic data and age-specific disease prevalence.
The study undertook secondary analysis of administrative data through extraction of de-identified clinical patient information. The study was performed with the approval of the Human Research Ethics Committee (Health and Medical) of the University of Wollongong (HE 12/447). Written informed consent was not obtained from individual patients due to the retrospective nature of the study design, however all data was exclusively extracted and de-identified by trained practice clinical staff only.

Results
The number of patients that had visited the 17 general practices within the previous 24 months (September 2011 to September 2013) was 164,435 (152,767 from within the Illawarra-Shoalhaven and 11,668 from outside of the catchment). The Illawarra-Shoalhaven catchment sample of 152,767 included 70,103 men, 82,506 women and 158 without an identified gender.
While 144 patients did not have their age recorded, the median age for the study sample (n = 152,767) was 39 years (IQR = 20 -58 years). Adults aged 20-24 years comprised the largest age group at 7% of the total sample, followed by the 40-44 year old age group (6.8%), and 5-9 year old children (6.6%). Older adults aged 65 years and above comprised 18.2% of the sample. The population pyramid of the study sample along with the comparison with the population structure of the 2011 estimated resident population of the Illawarra-Shoalhaven catchment is shown in Figure 3. The proportion of the local residential population of the SLAs that consulted the study practices during the study period is shown in Table 1 [14]. The majority of the study sample (92.9%) were found to reside within the Illawarra-Shoalhaven catchment SLAs.
The age-specific population and disease counts within the study sample for major chronic conditions and high BMI are shown in Tables 2 and 3 respectively, while the crude and age-standardised prevalence comparisons of the sample and the Australian national estimates [13] are shown in Tables 4 and 5 respectively. Overall the study sample population exhibits figures higher than Australian averages for the age-standardised prevalence of chronic conditions such as anxiety, cancer, hypertension, obesity and overweight/obesity (Figure 4). An illustration of age-specific burden of disease ( Figure 5) indicates that prevalence (non-age standardised) of asthma and mental health conditions (depression and anxiety) are significantly higher amongst younger age groups compared to older adults.

Discussion
In Australia, nationally representative data is available through the National Health Surveys (NHS) conducted by the Australian Bureau of Statistics (ABS) [15] and regionally through state surveys such as the annual New South Wales Population Health Survey [16]; however, extrapolations of these data to smaller geographical areas such as Local Government Areas (LGAs) and/or small area geographic regions within LGAs such as SLAs and suburbs is limited [10].
In 2011-12, the percentage of adults who saw a GP in the preceding 12 months varied across Australia ranging from 74% to 86%, with the Illawarra-Shoalhaven catchment recording the highest percentage nationally [17]. This indicates that extraction of patient and GP interactions over a 24 month period has the potential to include health information of almost the entire resident population of catchment regions such as the Illawarra-Shoalhaven; however, not all SLAs were evenly represented in terms of coverage in this study.
All general practice clinical and electronic medical record softwares utilise one of the several nationally validated health coding and medical classification systems such as SNOMED-CT, DOCLE, PYEFINCH and ICPC2+ [10]. These medical vocabularies enable recording of nationally/internationally recognised coded disease diagnosis, which also assists in maintaining accurate and consistent primary care clinical data that can be extracted and analysed [10]. Additionally, recent introduction of the Personally Controlled Electronic Health Records (PCEHR) in Australia, further requires general practices to "work towards recording the majority of diagnoses for active patients electronically" [18]. This enables accurate recording and easy identification of medical conditions and hence provides an opportunity for successful public health and chronic disease surveillance. However, Aizpuru et al. [19] suggest that chronic disease data from electronic health records provide a lower prevalence of conditions, as compared to health surveys because actions taken by physicians are often not recorded leading to cases being missed out. The limitations of primary care practice based data collection have also been illustrated in systematic reviews. Common problems include inconsistent diagnostic coding vocabulary of different clinical systems [20] and errors in data entry and recording [7,21]. Major barriers faced by general practice staff in this regard include difficulties with clinical coding of diagnoses; complexities with software applications; preference for entering free text rather than the pre-coded options; inadequate skills in information technology; time constraints; poor motivation and low prioritisation of data entry compared to other clinical duties; inconsistency of data entry; coding of a condition in order to justify choice of prescribed treatment; and the additional burden of including laboratory test results in patient records, as well as a need to enter a diagnosis, even in the early preclinical stages of the disease (Attard E, Ghosh A, Charlton K: Barriers faced by general practice staff in maintaining clean primary care databases: a systematic review, unpublished).
Studies conducted in Canada [22,23], Italy [24] and the UK [25] have demonstrated methods to improve the quality and accuracy of practice-based disease surveillance models. Keshavjee et al. [22] trained and employed 'data managers' in an attempt to standardise disease coding and de-identify patient information. Similarly, Griever et al. [23] employed a trained data entry clerk to check missing or incorrectly coded records. Cricelli et al. [24] trained GPs, themselves, in data entry and use of standard software. Pearson et al. [25] provided initial training and updates to all doctors and practice staff and carried out validation procedures such as verification of clinical coding, checking for rare diagnoses and those made outside the usual age and sex parameters through random validation visits to all participating practices.
A number of limitations to the study need consideration. The SPDS study identified various common data entry errors, including misspelt suburb names or postcodes that did not match the suburb entry, missing geographic information (postcodes and suburbs), missing values for age and sex, incorrect/mismatched entries within data entry fields such as height inserted in the weight field and/or vice-versa, and missing entries for weight and/or height measurements. While geospatial analysis of disease patterns is highly useful to target services towards areas of need [26], the SPDS data has highlighted difficulties in obtaining consistent information on patient residential postcodes and suburbs, including missing entries and mismatched entries, for example, a record with suburb of the Kiama (A) SLA and postcode of 3000 which is the incorrect postcode for this suburb. It was often unclear which variable to change in order to resolve this inconsistency and hence led to the deletion of such records from the analysis. Data quality and accuracy discrepancies required immense post-extraction data cleaning/editing efforts by the researchers which is vital to improve data linkage quality [27]. It is therefore imperative to undertake further research and technological innovation into improving utility and interface functionality of practice clinical desktop systems and creation of valid and easy to use advanced data aggregation systems which could vastly improve the processes of primary care clinical data extraction and modelling resulting in furthermore accurate prevalence estimation.
Both international literature and Australian evidence identifies a higher reported prevalence of overweight and/or obesity within primary care settings [28,29]. It has been argued that obese patients are more likely than healthy or underweight patients to visit their physician and also more likely to be weighed and measured by practice staff and clinicians. This results in lower population denominators for obesity and overweight, as also seen in our study (Table 3), and arguably higher prevalence figures. This is another limitation of the proposed method of surveillance.
Despite these limitations the SPDS study has significant implications for public health planning, primary health care delivery and epidemiological research. Apart from  ongoing chronic disease surveillance, the study methodology and protocol also has the potential to provide evidence-based direction to population health planning strategies aimed at addressing the local health needs of regional areas of Australia. The most recently reported planning documents for the Illawarra-Shoalhaven region of NSW, both from the Local Health District level [31] and the Medicare Local level [32], illustrate disease rates and health status indicators drawn from statistically modelled estimates from the 2006-07 Australian National Health Surveys. These figures are significantly outdated for planning purposes in 2014 and their generalisability for regional and smaller area disease prevalence and health status is questionable [10]. The proposed surveillance system also provides opportunity for monitoring trends in chronic disease prevalence across regular time intervals and promotes the engagement of general practice staff and clinicians in maintaining primary care clinical data quality and accuracy. The inclusion of a larger number of sentinel sites that are generalisable to the population being served would provide an accurate and region-specific system for the purposes of population health planning at the primary care level in order to improve the overall health of the community.

Conclusion
This study has demonstrated that extraction of patient clinical data from general practice settings is both a feasible and valid method to obtain a region-specific estimate of the prevalence of chronic diseases and mental health disorders within regional NSW, Australia. General practices that agreed to participate were included in the study, however further sampling methodology is required to identify which sentinel sites would provide an accurate and truly representative surveillance system. Technological updates/changes to general practice clinical software systems are recommended to improve functionality and data quality within general practice databases. Drop down menus with fixed nationally recognised lists of suburb names, cross matched with correct geographical concordance postcode and state information is currently lacking within the general practice clinical software systems. Additionally, making age, sex, postcode and suburb mandatory fields for creating a new patient record could eliminate the issue of missing data for these essential socio-demographic variables. Investment in computer skills and data entry training for general practice staff and advancements in data aggregation instruments are essential to improve quality of clinical data and their collection methods for effective utilisation by researchers and population health planners for surveillance purposes. Annually obtained chronic disease prevalence figures through the surveillance methodology implemented by the SPDS project, could provide more updated and granular health information for prompt health service planning.