Skip to main content
  • Research article
  • Open access
  • Published:

A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada



Hypertension is a common chronic condition affecting nearly a quarter of Canadians. Hypertension surveillance in Canada typically relies on administrative data and/or national surveys. Routinely-captured data from primary care electronic medical records (EMRs) are a complementary source for chronic disease surveillance, with longitudinal patient-level details such as sociodemographics, blood pressure, weight, prescribed medications, and behavioural risk factors. As EMR data are generated from patient care and administrative tasks, assessing data quality is essential before using for secondary purposes. This study evaluated the quality of primary care EMR data from one province in Canada within the context of hypertension surveillance.


We conducted a cross-sectional, descriptive study using primary care EMR data collected by two practice-based research networks in Alberta, Canada. There were 48,377 adults identified with hypertension from 53 clinics as of June 2018. Summary statistics were used to examine the quality of data elements considered relevant for hypertension surveillance.


Patient year of birth and sex were complete, but other sociodemographic information (ethnicity, occupation, education) was largely incomplete and highly variable. Height, weight, body mass index and blood pressure were complete for most patients (over 90%), but a small proportion of outlying values indicate data inaccuracies were present. Most patients had a relevant laboratory test present (e.g. blood glucose/glycated hemoglobin, lipid profile), though a very small proportion of values were outside a biologically plausible range. Details of prescribed antihypertensive medication, such as start date, strength, dose, frequency, were mostly complete. Nearly 80% of patients had a smoking status recorded, though only 66% had useful information (i.e. categorized as current, past, or never), and less than half had their alcohol use described; information related to amount, frequency or duration was not available.


Blood pressure and prescribed medications in primary care EMR data demonstrated good completeness and plausibility, and contribute valuable information for hypertension epidemiology and surveillance. The use of other clinical, laboratory, and sociodemographic variables should be used carefully due to variable completeness and suspected data errors. Additional strategies to improve these data at the point of entry and after data extraction (e.g. statistical methods) are required.

Peer Review reports


Hypertension is a common chronic condition, affecting more than one in five Canadians, and is associated with an increased risk of cardiovascular disease and mortality, as well as considerable economic and societal costs [1]. Monitoring the incidence and prevalence of hypertension over time is an important part of surveillance systems and public health activities. In Canada, administrative databases, which include in-patient hospital discharges and physician billing claims, are often used to report on hypertension prevalence estimates, such as the Canadian Chronic Disease Surveillance System (CCDSS) [2]. While administrative sources provide population-level data for those who have encountered the healthcare system, there are a lack of clinical details that are essential for better understanding the patient context and disease severity, including blood pressure (BP), body mass index (BMI), and lifestyle risk factors. Physical measures surveys are another commonly used source, as they obtain directly measured BP coupled with health-related interviews, as achieved by the Canadian Health Measures Survey (CHMS) [3]. However, these surveys are costly to maintain, response rates are often low, and the cross-sectional design does not allow for longitudinal follow-up.

A contemporary approach to hypertension surveillance is utilizing the clinically-generated, detailed data from electronic medical records (EMR), particularly from primary care settings where chronic conditions are largely diagnosed and managed [4, 5]. EMR adoption among Canadian family physicians is growing, with an estimated 83% now using EMRs in practice to some degree in 2018 [6]. Additionally, linkages between primary care EMR and administrative data can further enhance surveillance opportunities by providing a more complete perspective of disease manifestation and current management practices. Because EMR data are recorded to support individual patient care and administrative tasks, they may not be produced with the same standardization and rigor as research data; as such, some concern exists about their re-use for secondary purposes [7]. Therefore, investigations into data quality are necessary to determine whether the data are ‘fit for purpose’. Previous studies evaluating the quality of primary care EMR data in Canada have typically reported on a limited aspect of quality (e.g. completeness) or data elements [8,9,10] or have assessed quality more broadly without focusing on a specific context for use [11, 12]. The objective of this study was to comprehensively assess the quality of primary care EMR data in Alberta, Canada within the context of hypertension.


Data source

The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) is a collaboration of eleven practice-based research networks (PBRN) across Canada who manage the extraction, cleaning and processing of de-identified EMR data from primary care settings [13]. At present, over 1200 primary care providers and 1.8 million patients contribute data from eight provinces and territories [14]. National CPCSSN data have been previously used to report on the epidemiology of many conditions in primary care, such as hypertension [5], diabetes [15], depression [16], osteoarthritis [17], dementia [18], chronic obstructive pulmonary disease [19], and others. The CPCSSN organization and data extraction and processing have been described elsewhere [13, 20].

This data quality assessment utilized primary care EMR data obtained by the two PBRNs in the province of Alberta – the Northern and Southern Alberta Primary Care Research Networks (NAPCReN and SAPCReN, respectively). Because healthcare in Canada is organized and delivered separately within each province or territory, only one province (Alberta) was chosen for the data quality assessment in order to minimize variation in the data due to interprovincial differences such as healthcare delivery and practice, drug coverage, health information legislation, EMR uptake and extent of use, types of EMR systems available, and many other factors [21, 22].

In Alberta, there were 323 providers (mostly family physicians with a small proportion of nurse practitioners and community pediatricians) participating from 53 primary care practices. This represents slightly over 5% of the total number of family physicians in Alberta [23]. As of June 2018, de-identified EMR data were extracted from 397,518 patients in total; this reflected approximately 9.2% of Alberta’s general population of 4.3 million people [24]. The CPCSSN data has previously been found to overrepresent older adults and women [25], but this is typical of primary care populations.

Currently, CPCSSN in Alberta extracts from five distinct EMR systems – Wolf, Med Access, Practice Solutions Suite, Accuro and Healthquest. The earliest (or ‘start’) date of information in the CPCSSN database varies by clinic and by patient, depending on when a clinic first implemented their EMR system, as well as when the patient first attended the clinic.

Patient sample

Adult patients (18 years and older) who had at least one primary care encounter in the previous two years (July 1, 2016 to June 30, 2018) were included, in order to establish an ‘active’ patient population. Any patient who was recorded as ‘deceased’ or ‘inactive’ in the EMR was excluded, as were any patients or providers who had explicitly requested to opt out of the CPCSSN database. The data quality assessment focused specifically on patients with hypertension who were identified using a CPCSSN-developed definition [26]. The hypertension definition consisted of a combination of International Classification of Disease version nine (ICD-9) codes (401, 402, 403, 404, 405) and medications located throughout the EMR: a minimum of two physician billing codes within two years or any occurrence of a diagnosis in the Problem List/Profile or prescription for an anti-hypertensive medication (with medication criteria alone being insufficient if other specific diagnoses exist, such as heart failure or diabetes) [26]. The definition was validated using chart reviews as the reference standard and demonstrated good sensitivity (84.9%) and specificity (93.5%) [26].

Data quality assessment

The data quality assessment was a cross-sectional, descriptive evaluation guided by reporting recommendations for distributed data networks [27]. Data elements were selected based on their potential use and relevancy for hypertension surveillance, as well as availability in the CPCSSN data. These included: patient demographics; physical examinations (weight, height, body mass index [BMI], and systolic and diastolic blood pressure); laboratory values (high density lipoprotein [HDL] cholesterol, low density lipoprotein [LDL] cholesterol, total cholesterol, triglycerides, fasting blood glucose, glycated hemoglobin [HbA1C]), anti-hypertensive medications (defined using categories of the relevant groups of Anatomical Therapeutic Chemical [ATC] codes: C02*, C03*, C07*, C08*, C09*); and risk factor records for smoking and alcohol use. Only the CPCSSN-processed/coded values were used, as these are typically the data elements that are accessible from CPCSSN for secondary purposes. A full description of all data elements can be found in the CPCSSN Data Dictionary online [14].

Summary statistics were reported for continuous variables, which included range, mean, and median. Proportions (restricted to the three most frequent values) and number of unique values were described for categorical variables. Missingness was reported as a proportion of patients without a recorded data element (e.g. height) or record (e.g. medication, smoking); missingness of specific items within a record was also reported (e.g. dose in medication record). Data completeness was also represented visually by clinic and EMR type.

Several temporal aspects of the data were examined – the proportion of patients who had at least one physical exam measurement or laboratory value documented in the previous year (July 1, 2017 to June 30, 2018) was reported, in addition to the proportion of risk factor (i.e. smoking and alcohol) and medication records that contained a stop/end date prior to the start date. An exploration of patient-level weight values over time were visualized by plotting the difference between subsequent weight measurements and the length of time (days) between subsequent measurements for individuals with at least two weight measurements.

External validity was evaluated by comparing the most recent crude hypertension prevalence estimates from three national population-level sources: administrative data from the Canadian Chronic Disease Surveillance System (CCDSS), consisting of physician billing claims, hospitalizations and prescription drug records [28]; the Canadian Health Measures Survey (CHMS), which defines hypertension based on standardized, direct BP measurements and health-related interviews [29]; and self-reported high BP from the Canadian Community Health Survey (CCHS) [30]. Hypertension prevalence estimates from the national CPCSSN data [5] were also used as a comparison to the regional-level (Alberta) data.

RStudio version 1.1.456 was used for the analysis, which was conducted in 2019. This study was approved by the University of Calgary’s Conjoint Health Research Ethics Board (REB17–1825) and the University of Alberta’s Health Research Ethics Board (Pro00079372).


In the CPCSSN data for Alberta, there were 205,364 adult patients who had at least one primary care encounter in the previous two years; of these, 48,377 patients were identified with hypertension and who were not labelled ‘inactive’ at the practice or deceased. Patients in the hypertension sample had a median of 8.0 years (IQR 7) of information in their record. Figure 1 provides a visual summary of the completeness of data for patient demographics, physical measurements, and smoking status by each of the 53 clinics and 5 EMR systems included in the data quality assessment. The data element characterization in Tables 1, 2, 3 and 4 provides a more in-depth examination of the quality of hypertension-related variables.

Fig. 1
figure 1

Summary of Completeness of Select EMR Data Elements by EMR Type and Clinic for Patients with Hypertension

Table 1 Missingness and summary statistics for patient demographic informationa
Table 2 Missingness and summary statistics for physical measurements and laboratory values
Table 3 Missingness and summary statistics for anti-hypertensive medications
Table 4 Missingness and summary statistics for risk factor records

Patient demographics

Birth year was complete for all patients, as was sex (with the exception of two patients). However, nearly all socio-demographic information on patients was mostly incomplete (Table 1). For those who had some information recorded in ethnicity, occupation, or education fields, the data were highly inconsistent – for instance, over 3500 unique entries were recorded for occupation and more than 75 distinct entries were found for ethnicity.

Height, weight, BMI

Approximately 10% of patients were missing a height or BMI value and even fewer patients were missing weight (Table 2). Males had a median of four measurements for height, weight, or BMI and females had five, with these measurements showing a skewed distribution. From the lower and upper ranges of the height, weight and BMI values, it appears that data errors are present. For example, female weight values ranged from 1.8 to 477 kg, which is biologically unlikely. When plotting patient-level height and weight values (Fig. 2), those located outside the main cluster of points visually identify specific data errors. For instance, the vertical line of points approaching 0 on the x (weight) axis might indicate a data entry error (e.g. weight entered as 10 instead of 100) or swapped height and weight values (e.g. height in metres entered in the weight field). Another observable area of atypical points was between 150 and 200 on the x (weight) axis, which potentially represents height and weight values that were entered in the wrong fields (e.g. weight = 175 and height = 100 recorded instead of weight = 100 kg and height = 175 cm).

Fig. 2
figure 2

Paired Height and Weight Measurements in Patients with Hypertension

Figure 3 investigates possible errors in weight values using successive patient-level weight measurements for those who had at least two weight values recorded in their EMR (n = 39,202). It would be expected that changes in individual weight might demonstrate more variability over time (e.g. patient weight recorded 10 days apart should have minimal difference, whereas weight measurements taken several years apart might show a more significant change). Two peaks centred around 100 and − 100 on the y-axis emerged as potentially problematic data: in a relatively short time period between measurements, the difference between successive weight measurements was approximately 100 kg for patients clustered around those two peaks. This likely represents inconsistencies in the unit of measurement (e.g. kilograms versus pounds) for subsequent weight measurements for a given individual. However, the extent of the problem was not substantial – the majority of weight values (94.8%) occurred within two standard deviations of the central peak (mean − 0.29) and at least one potential data error (i.e. outside two standard deviations) at any time was detected in the records of 18.4% of patients.

Fig. 3
figure 3

Differences in Subsequent Patient Weight Measurements Throughout Time in Patients with Hypertension. Note: patients with two or more weight values recorded at any time in their EMR (n = 39,202)

Blood pressure

BP measurements were well-recorded in terms of completeness (99%) and the majority of patients (85%) had at least one measurement recorded in the previous year (Table 2). However, BP values at the minimum and maximum end of the range may indicate data errors (Table 2). These values could be biologically possible, but would be very unlikely in an outpatient setting; for instance, a systolic BP of 52 might indicate shock and a systolic BP of 290 would be an emergency event. In addition, CPCSSN also sets limits to BP values when processing the raw EMR data (50–300 mmHg for sBP; 20–200 mmHg for dBP), which would underestimate the true range of values.

Male patients had a median of 16 total BP measurements recorded in their EMR and females had slightly more (median = 18). A small proportion of patients had large sums of annual BP measurements – for instance, 4.2% of females and 4.0% of males were above the 95th percentile for number of BP measurements (greater than 10) in 2017 (data not shown).

Laboratory values

Of the laboratory tests measuring blood glucose, HbA1C values were present in the EMR more often than fasting glucose (88% versus 79% of patients), and more patients had an HbA1C test result in the previous year compared to the fasting glucose test (Table 2).

The lipid values included in this assessment (LDL, HDL, total cholesterol, triglycerides) were available for the majority of patients in this cohort (at least 91%, varying by lab type), with a median of 4–5 values for each patient in the EMR (Table 2). Female patients were observed to have a slightly fewer lipid values present in their EMR compared to male patients (Table 2).

For all types of lab results, the upper and lower limits were unlikely to be seen in an outpatient setting (i.e. primary care) and many values were beyond a biologically plausible range (e.g. HDL and LDL lower value = 0). This points to likely data errors at the upper and lower ends of the range of values, however, it was only for a very small proportion of lab values.

Hypertensive medications

The vast majority of males (92%) and females (93%) with hypertension had at least one recorded anti-hypertensive prescription, with a median of six anti-hypertensive medication prescriptions per person (Table 3). The medication records themselves were fairly complete; all records contained a start date and most contained a stop date, strength, dose, frequency, duration, and count. Drug Identification Number (DIN) and ‘reason for medication’ mostly incomplete, with DIN missing in over half of medication records and ‘reason’ missing in over three-quarters of records.

Smoking and alcohol status

Within the Risk Factor section in the EMR, nearly 80% of patients had a smoking status recorded, with ‘Unknown’ and ‘Never’ as the most frequently recorded categories (Table 4). However, after excluding the indiscriminate ‘Unknown’ smoking status, a total of 31,976 patients (66.1%) and 68,110 records remained across three categories: ‘Current’, ‘Past’ or ‘Never’ (data not shown). Males and females had a similar number of smoking records per person (median = 1; mean = 3). All start and end dates were missing from the records.

More males than females had their alcohol use recorded (47 and 40%, respectively) and these records were primarily for ‘Current’ users (Table 4), indicating that alcohol use is likely recorded differentially between users and non-users. Patients had a mean of 2 records in their EMR (median of 1) and no records contained start or end dates.

Of note, a ‘Date Created’ field exists for both smoking and alcohol records. This field indicates when the record was created in the EMR system but does not necessarily correspond to the start of the risk behaviour. ‘Date Created’ was present in 80.6% of smoking records and 76.6% of alcohol use records.

External validity

The overall crude estimate for Alberta-specific hypertension prevalence in the CPCSSN data (23.6%) were similar to the 2014–15 physical measure survey (CHMS) (23.3%) and was also comparable to the national CPCSSN estimate (22.8%) (Table 5). The largest discrepancy was seen in the self-reported CCHS, with hypertension prevalence estimated at 17.7%. Male patients in the CPCSSN database had a higher hypertension prevalence (26.1%) than all other sources, while the prevalence for female patients (21.6) in the CPCSSN data was similar to the health measures survey (22.0%) and slightly lower than the CCDSS (25.6%).

Table 5 Prevalence comparison of adults with hypertension in various data sources


This paper describes the quality of primary care EMR data in Alberta within the context of utilization for hypertension surveillance and epidemiology. Overall, there was observable variability due to the type of EMR system, between clinics, and among the data elements themselves. As this assessment focused on patients with hypertension, it was not surprising to see blood pressures and prescribed medication records that were largely complete and contained minimal outliers; these data constitute a particularly valuable contribution for surveillance purposes, given that BP and prescribing information are not available in administrative data or are limited (i.e. cross-sectional) in survey data. Although these data cannot confirm whether a patient has filled their prescription or is adherent, the information within the medication records are relatively complete and can be used to approximate persistence/adherence, for example, by calculating medication possession ratio or using similar methods [31].

The select laboratory values were present in the EMR of the majority of patients in this cohort, with the exception of fasting blood glucose. This aligns with current clinical guidelines recommending routine testing of lipids and blood glucose/glycated hemoglobin for individuals with hypertension [32]. Although most laboratory test results in Alberta are imported directly into the EMR from the community lab provider, data quality issues were still present, although to a very small degree. The observed range of values demonstrated upper and lower limits that are not likely in an outpatient setting and some that were biologically implausible (e.g. 0 mmol/L for LDL and HDL; 43 mmol/L for fasting glucose). These errors may have been introduced during the import of lab results to the EMR or during the CPCSSN processing to convert different units of measurement to a standard unit (e.g. mmol/mol to % for HbA1C).

Other information, such as sociodemographic, height, weight, and risk factor information, were more inconsistent and less complete. Although achieving 100% completeness for all data elements may not be realistic, it is not unreasonable to aim for near complete information for these data elements at the point of care. Cardiovascular disease guidelines suggest that smoking status should be updated on a regular basis and given that screening is often risk based, information about alcohol use, height, weight, BMI, and ethnicity are particularly important to document for a hypertensive cohort [32]. However, distinguishing between data that are missing due to inadequate data entry or as a result of not extracting the data is difficult. One significant challenge when addressing poor data quality is determining the source of the issue – for instance, missing data may be due to the unavailability of these fields in certain EMR systems (in which case, missingness will always exist); patients might not be asked about specific topics, such as alcohol use or ethnicity, or they may decline to answer; lastly, the CPCSSN processes may omit extraction from certain fields of the EMR either deliberately (e.g. identifiable fields or physician notes) or unintentionally (e.g. if an EMR system upgrade changes the names of data elements, which would subsequently affect the CPCSSN extraction code). Identifying true inaccuracies in the data are similarly problematic; this may be possible for some data elements through a chart review, with particular attention to the detailed physician notes and scanned documents (i.e. specialist letters, diagnostic imaging) that are not currently captured in the CPCSSN data. However, this is a time-intensive method and the structured EMR fields are likely to contain the same errors and omissions as the CPCSSN data. Beyond this, confirming with or measuring patients directly to verify data elements in the EMR would most accurately reveal true data errors but this method is also the least feasible.

Therefore, the most appropriate strategies for preventing and mitigating EMR data quality issues should be multifaceted and involve a variety of settings. CPCSSN has largely taken a post-extraction analytic approach to data improvement. This includes extensive cleaning and coding algorithms, the development and validation of case definitions for various conditions that are made availble as part of the database [26, 33,34,35], and exploration of more advanced techniques like natural language processing [36] and machine learning [35]. As an example, CPCSSN is currently developing a pattern-matching algorithm that aims to enhance the completeness and accuracy of smoking records. In the raw or original EMR data, some additional information related to smoking, such as frequency of tobacco use and quantity of tobacco units consumed (e.g. cigarettes / cigars, packs), is present but primarily in unstructured, lengthy text strings that is not useful for analysis, may also contain identifiable patient information, and is therefore not currently available to researchers. The pattern-matching algorithm is designed to extract only smoking-related information from the free text and categorize the record into a defined smoking status, leading to more available coded data for researchers to access.

A number of other strategies have been shown to improve the completeness and accuracy of EMR data – some occur at the practice level, such as employing a dedicated data entry clerk [37] or providing data quality audit and feedback reports to clinicians [38]. Other initiatives require more substantial resources and uptake, such as mandated national EMR content standards [39], developing EMR interfaces that are easier to navigate and contain more stuctured fields, and promoting financial or other incentives for ‘meaningful EMR use’ [40].

In the future, routine linkage to other data sources, like administrative health data, could enhance quality by providing a mechanism to verify certain aspects of EMR data and expand the breadth of information about individual patients throughout the broader healthcare system.


This paper provides a quality assessment of select CPCSSN data elements deemed to be important for hypertension surveillance or research, but it was not possible to examine and report on all variables contained in the CPCSSN database in a single manuscript, nor was it possible to examine discrete cardiovascular outcomes related to hypertension (for example, hospitalization for myocardial infarction), as this information is usually contained in other databases external to CPCSSN or captured in the EMR in an inaccessible format (e.g. PDF document, free text notes). It was also not feasible to quantify the true accuracy of data elements, other than appraising the plausibility of values through descriptive means. Secondly, during the CPCSSN processing and data transformation stages for physical exam measurements and some lab types, restrictions are introduced for out-of-bounds values and thus, the summary statistics presented in this paper may not reflect the full variation of values originating from the source EMR. In addition, any changes or improvements made to the CPCSSN processing may result in slight differences in the CPCSSN EMR database between each extraction cycle. Thirdly, although the CPCSSN definition for hypertension demonstrated high sensitivity and specificity, a potential for misclassification still exists. This may have underestimated the number of patients with hypertension or produced a patient sample that is biased towards a greater severity of illness. Lastly, the quality was described specifically for CPCSSN data from Alberta and within the context of hypertension. This is not a population-level data source and only constitutes a sample of participating providers and patients who have sought care. Thus, the overall findings may not be representative of the wider Alberta population or for other provinces or territories that participate in CPCSSN, and may also differ in other disease-based contexts. However, CPCSSN has developed uniform data extraction, processing, and standardization methods across the country, which may allow for other regional networks to compute the same data quality assessment for comparison.


Primary care EMR data are a valuable data source for hypertension surveillance or within an epidemiological context. The high-quality and longitudinal blood pressure and prescribed antihypertension medication data are particularly useful, as these types of data are not found in traditional administrative databases. Other data elements, such as sociodemographics, physical examination values, laboratory results, and risk factor information, exhibited variation in quality. These data elements may be less useful in their current state but offer promising value in the future once data quality issues can be addressed through additional pre- or post-extraction solutions.

Availability of data and materials

The national CPCSSN data are available to approved researchers for a fee; for more information or to submit a Letter of Intent, visit:

The Alberta-specific CPCSSN data that was used for this analysis are available as two separate data sets through the regional networks (NAPCReN, SAPCReN). Data access procedures and requirements vary by network; contact the corresponding author for more information or visit: or



Anatomical Therapeutic Chemical [classification system]


Body mass index


Blood pressure


Canadian Chronic Disease Surveillance System


Canadian Health Measures Survey




Canadian Primary Care Sentinel Surveillance Network


Diastolic blood pressure


Drug Identification Number


Electronic medical record


Hemoglobin A1C (glycated hemoglobin)


High density lipoprotein


International Classification of Disease version 9


Interquartile range




Low density lipoprotein


Millimetres mercury


Millimoles per litre


Northern Alberta Primary Care Research Network


Practice-based research network


Southern Alberta Primary Care Research Network


Systolic blood pressure


Standard deviation


  1. Padwal RS, Bienek A, McAlister FA, Campbell NR. Epidemiology of hypertension in Canada: an update. Can J Cardiol. 2016;32(5):687–94.

    Article  Google Scholar 

  2. Public Health Agency of Canada. Report from the Canadian Chronic Disease Surveillance System: hypertension in Canada, 2010. Ottawa: Public Health Agency of Canada; 2010.

    Google Scholar 

  3. Atwood KM, Robitaille CJ, Reimer K, Dai S, Johansen HL, Smith MJ. Comparison of diagnosed, self-reported, and physically-measured hypertension in Canada. Can J Cardiol. 2013;29(5):606–12.

    Article  Google Scholar 

  4. Birtwhistle R, Williamson T. Primary care electronic medical records: a new data source for research in Canada. CMAJ. 2015;187(4):239–40.

    Article  Google Scholar 

  5. Godwin M, Williamson T, Khan S, Kaczorowski J, Asghari S, Morkem R, et al. Prevalence and management of hypertension in primary care practices with electronic medical records: a report from the Canadian primary care sentinel surveillance network. CMAJ Open. 2015;3(1):E76–82.

    Article  Google Scholar 

  6. Canada Health Infoway. Physicians’ use of digital health and information technologies in practice [Internet]. 2018 Canadian Physician Survey. 2018 [cited 2019 Feb 28]. Available from:

  7. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51.

    Article  Google Scholar 

  8. Singer A, Yakubovich S, Kroeker AL, Dufault B, Duarte R, Katz A. Data quality of electronic medical records in Manitoba: do problem lists accurately reflect chronic disease billing diagnoses? J Am Med Inform Assoc. 2016;23(6):1107–12.

    Article  Google Scholar 

  9. Greiver M, Aliarzadeh B, Meaney C, Moineddin R, Southgate CA, Barber DTS, et al. Are we asking patients if they smoke?: Missing information on tobacco use in Canadian electronic medical records. Am J Prev Med. 2015;49(2):264–8.

    Article  Google Scholar 

  10. Torti J, Duerksen K, Forst B, Salvalaggio G, Jackson D, Manca D. Documenting alcohol use in primary care in Alberta. Can Fam Physician. 2013;59(10):1128.

    PubMed  PubMed Central  Google Scholar 

  11. Terry AL, Stewart M, Cejic S, Marshall JN, de Lusignan S, Chesworth BM, et al. A basic model for assessing primary health care electronic medical record data quality. BMC Med Inform Decis Mak. 2019;19:30.

    Article  Google Scholar 

  12. Tu K, Widdifield J, Young J, Oud W, Ivers NM, Butt DA, et al. Are family physicians comprehensively using electronic medical records such that the data can be used for secondary purposes? A Canadian perspective. BMC Med Inform Decis Mak. 2015;15:67.

    Article  Google Scholar 

  13. Garies S, Birtwhistle R, Drummond N, Queenan J, Williamson T. Data resource profile: National electronic medical record data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Int J Epidemiol. 2017;46(4):1091–1092f.

    Article  Google Scholar 

  14. CPCSSN. Canadian Primary Care Sentinel Surveillance Network (CPCSSN) [Internet]. 2016 [cited 2019 Feb 14]. Available from:

    Google Scholar 

  15. Greiver M, Williamson T, Barber D, Birtwhistle R, Aliarzadeh B, Khan S, et al. Prevalence and epidemiology of diabetes in Canadian primary care practices: a report from the Canadian primary care sentinel surveillance network. Can J Diabetes. 2014;38(3):179–85.

    Article  Google Scholar 

  16. Williamson T, Khan S, Manca D, Birtwhistle R, Wong ST, Patten S, et al. The diagnosis of depression and its treatment in Canadian primary care practices: an epidemiological study. CMAJ Open. 2014;2(4):E337–42.

    Article  Google Scholar 

  17. Williamson T, Green ME, Jordan KP, Khan S, Birtwhistle R, Peat G, et al. Prevalence and management of osteoarthritis in primary care: an epidemiologic cohort study from the Canadian primary care sentinel surveillance network. CMAJ Open. 2015;3(3):E270–5.

    Article  Google Scholar 

  18. Khan S, Garies S, Drummond N, Molnar F, Birtwhistle R, Williamson T. Prevalence and management of dementia in primary care practices with electronic medical records: a report from the Canadian primary care sentinel surveillance network. CMAJ Open. 2016;4(2):E177–84.

    Article  Google Scholar 

  19. Williamson T, Natajaran N, O’Donnell DE, Khan S, Cave A, Green ME, et al. Chronic obstructive pulmonary disease in primary care: an epidemiologic cohort study from the Canadian primary care sentinel surveillance network. CMAJ Open. 2015;3(1):E15–22.

    Article  Google Scholar 

  20. Garies S, Cummings M, Forst B, McBrien K, Soos B, Taylor M, et al. Achieving quality primary care data: a description of the Canadian Primary Care Sentinel Surveillance Network data capture, extraction, and processing in Alberta. Int J Popul Data Sci. 2019;4(2):1-8.

  21. Lewis S. A system in name only — access, variation, and reform in Canada’s provinces. N Engl J Med. 2015;372(6):497–500.

    Article  CAS  Google Scholar 

  22. Chang F, Gupta N. Progress in electronic medical record adoption in Canada. Can Fam Physician. 2015;61(12):1076–84.

    PubMed Central  Google Scholar 

  23. Canadian Medical Association. Family medicine profile [Internet]. 2018. Available from:

    Google Scholar 

  24. Government of Alberta. Quarterly population report; Second quarter 2019 [Internet]. 2019 [cited 2019 Oct 21]. Available from:

    Google Scholar 

  25. Queenan JA, Williamson T, Khan S, Drummond N, Garies S, Morkem R, et al. Representativeness of patients and providers in the Canadian primary care sentinel surveillance network: a cross-sectional study. CMAJ Open. 2016;4(1):e28–32.

    Article  Google Scholar 

  26. Williamson T, Green ME, Birtwhistle R, Khan S, Garies S, Wong ST, et al. Validating the 8 CPCSSN case definitions for chronic disease surveillance in a primary care database of electronic health records. Ann Fam Med. 2014;12(4):367–72.

    Article  Google Scholar 

  27. Kahn MG, Brown JS, Chun AT, Davidson BN, Meeker D, Ryan PB, et al. Transparent reporting of data quality in distributed data networks. EGEMS (Wash DC). 2015;3(1):7.

    Google Scholar 

  28. Public Health Agency of Canada. Public health infobase: Canadian Chronic Disease Surveillance System (CCDSS) [Internet]. Ottawa: Canadian Chronic Disease Surveillance System; 2017. [cited 2019 Mar 12]. Available from:

    Google Scholar 

  29. DeGuire J, Clarke J, Rouleau K, Roy J, Bushnik T. Blood pressure and hypertension. Health Rep. 2019;30(2):14–21.

    PubMed  Google Scholar 

  30. Statistics Canada. Health fact sheets: chronic conditions, 2016 [internet]. Ottawa: Statistics Canada; 2017. Available from:

    Google Scholar 

  31. Vink NM, Klungel OH, Stolk RP, Denig P. Comparison of various measures for assessing medication refill adherence using prescription data. Pharmacoepidemiol Drug Saf. 2009;18:159–65.

    Article  CAS  Google Scholar 

  32. Tobe SW, Stone JA, Anderson T, Bacon S, Cheng AY, Daskalopoulou SS, et al. Canadian Cardiovascular Harmonized National Guidelines Endeavour (C-CHANGE) guideline for the prevention and management of cardiovascular disease in primary care: 2018 update. CMAJ. 2018;190(40):E1192–206.

    Article  Google Scholar 

  33. Cave AJ, Davey C, Ahmadi E, Drummond N, Fuentes S, Kazemi-Bajestani SMR, et al. Development of a validated algorithm for the diagnosis of paediatric asthma in electronic medical records. NPJ Prim Care Respir Med. 2016;26:16085.

    Article  Google Scholar 

  34. Queenan JA, Farahani P, Ehsani-Moghadam B, Birtwhistle RV. The prevalence and risk for herpes zoster infection in adult patients with diabetes mellitus in the Canadian primary care sentinel surveillance network. Can J Diabetes. 2018;42(5):465–9.

    Article  Google Scholar 

  35. Lethebe BC, Williamson T, Garies S, McBrien K, Leduc C, Butalia S, et al. Developing a case definition for type 1 diabetes mellitus in a primary care electronic medical record database: an exploratory study. CMAJ Open. 2019;7(2):E246–51.

    Article  Google Scholar 

  36. Lix L, Munakala SN, Singer A. Automated classification of alcohol use by text mining of electronic medical records. Online J Public Health Inform. 2017;9(1):e069.

    PubMed Central  Google Scholar 

  37. Greiver M, Barnsley J, Aliarzadeh B, Krueger P, Moineddin R, Butt D, et al. Using a data entry clerk to improve data quality in primary care electronic medical records: a pilot study. Inform Prim Care. 2011;19(4):241–50.

    PubMed  Google Scholar 

  38. van der Bij S, Khan N, Ten Veen P, de Bakker DH, Verheij RA, Blumenthal D, et al. Improving the quality of EHR recording in primary care: a data quality feedback tool. J Am Med Inform Assoc. 2016;356(24):2527–34.

    Google Scholar 

  39. Canadian Institute for Health Information. Pan-Canadian primary health care electronic medical record content standard, version 3.0 [Internet]. Ottawa: Canadian Institute for Health Information; 2014. Available from:

    Google Scholar 

  40. Canadian Medical Association. How can Canada achieve enhanced use of electronic medical records? Toronto: Canadian Medical Association; 2014.

    Google Scholar 

Download references


The authors would like to thank Dr. Michael Cummings for providing valuable comments to this manuscript, as well as to Brian Forst and Larka Soos for providing the CPCSSN data sets. Lastly, the authors thank all participating CPCSSN sentinels for contributing de-identified EMR data and making this work possible.


SG is funded through an Alberta Innovates Health Solutions Graduate Studentship (2016–2020). The CPCSSN project, hosted by NAPCReN and SAPCReN, is funded by the Canadian Institutes of Health Research (CIHR) and Alberta Innovates through the Alberta Strategies for Patient Oriented Research (SPOR) Primary and Integrated Health Care Innovation Network, as well as the Public Health Agency of Canada. The funders had no role in the study design, data collection, analysis or interpretation of the data, or in the writing of the manuscript.

Author information

Authors and Affiliations



SG conceptualized the study, analysed and interpreted the data, and wrote the initial draft of the manuscript. KM, HQ, DM, ND, and TW contributed to the development of methods, interpretation, and revisions to the manuscript. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Stephanie Garies.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the University of Calgary’s Conjoint Health Research Ethics Board (REB17–1825) and the University of Alberta’s Health Research Ethics Board (Pro00079372). A waiver of individual patient consent was granted by the Research Ethics Board at each university affiliated with each participating CPCSSN practice-based research network for the collection and use of de-identified EMR data. Written consent was obtained from each sentinel participating in the CPCSSN project.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garies, S., McBrien, K., Quan, H. et al. A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada. BMC Public Health 21, 264 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: