Data sources
We conducted a point prevalence study among the UK general population using the Clinical Practice Research Datalink (CPRD) GOLD dataset, an anonymised sample of electronic health records from primary care practices across the UK [17]. The dataset includes diagnoses recorded using Read codes, primary care prescribing, and results of tests ordered in primary care. Data validity has been shown to be high [18]. The UK has universal healthcare, and the sample of the population who are in CPRD GOLD was found to be nationally representative by age and sex in March 2011: we re-assessed representativeness in 2019 in a sensitivity analysis [17].
Secondary care (hospital) data linkage is available for approximately 75% of CPRD GOLD-registered individuals in England, based on practice-level consent. For patients admitted to hospital, the Hospital Episode Statistics Admitted Patient Care dataset records diagnoses using International Classification of Diseases ICD-10 codes, and procedures such as chemotherapy using Classification of Interventions and Procedures OPCS-4 codes [19].
The CPRD Pregnancy Register uses validated algorithms, combining information across the primary care record such as antenatal scans, expected delivery dates, and deliveries, terminations and miscarriage records, to date and characterise pregnancies in CPRD GOLD [20].
Index dates
Our primary analysis index date was 5 March 2019 for up-to-date national prevalence estimates. CPRD GOLD coverage peaked in 2014, when it included approximately 7% of the UK population: by 2019 the dataset was smaller and did not cover all regions in England. Since the dataset in 2014 therefore offered greater power than 2019, and full regional representation across England, we repeated point prevalence estimates for 5 March 2014 as a sensitivity analysis.
Pregnancy was described for the index date of 5 March 2014 only, not 5 March 2019, since the latest Pregnancy Register update was in February 2018.
Study population
The study population comprised individuals aged 2–100 years with a current registration and a record meeting CPRD quality criteria (acceptable patient record and practice up to standard) in CPRD GOLD, with at least 1 y’s prior registration to allow recording of underlying conditions [21]. Eligibility started on the latest of: 1 January 2019, second birthday, a year after registration, or practice meeting CPRD quality standards. Eligibility ended at the earliest of: 5 March 2019, hundredth birthday, death, leaving the practice, or last data collection from the practice. Individuals with any time eligible between 1 January and 5 March were included in the main analysis of point prevalence on 5 March to increase study power, with a sensitivity analysis limited to individuals active in the dataset on 5 March 2019.
For pregnancy, the study population comprised women aged 11–49 years. As pregnancy is transient, women were required to be registered in the dataset on 5 March 2014, rather than any time between 1 January and 5 March 2014.
Definition of at-risk population
In national guidance, all individuals aged ≥70 years are considered at moderate risk (Table 1) [7]. Since age-specific population estimates are readily available, the primary analysis for this study defined at-risk status based on underlying health conditions alone, rather than age. An additional analysis estimated the size of the at-risk population including all individuals aged ≥70 years.
We defined the COVID-19 at-risk population as individuals with at least one underlying health condition conferring moderate or high risk of severe COVID-19 according to national guidance (Table 1). Namely: any history of chronic respiratory disease (excluding asthma), heart disease, kidney disease, neurological conditions such as multiple sclerosis, diabetes mellitus; or current asthma, severe obesity, or immunosuppression; assessed on the index date [7].
Underlying conditions were defined using diagnoses, height and weight measurements, test results, and prescriptions recorded in primary care for the main analysis. Pregnancy status was ascertained from the CPRD Pregnancy Register (Supplementary Table 1, Additional File 1). Individuals with no recorded body mass index were included in the analysis, categorised as having no evidence of severe obesity. For analysis using linked secondary care data, diagnoses and procedures recorded in secondary care were additionally ascertained from ICD-10 and OCPS-4 codes respectively.
Multimorbidity was defined as more than one condition among the following domains: asthma or other chronic respiratory disease; chronic heart disease; chronic kidney disease; chronic liver disease; chronic neurological disease; diabetes; or immunosuppression (including individuals with dysplenia and organ transplant recipients).
Cancer survivors have an increased risk of COVID-19 mortality but non-haematological cancer survivors are only included in current COVID-19 guidance if receiving immunosuppressing treatment (Table 1) [2]. Separately to the study at-risk definition we described prevalence of any new cancer diagnosis in the past one and five years, as cancer survivors may be at increased risk of COVID-19 related death [2].
Statistical analysis
Point prevalence estimates of the at-risk population and each underlying condition on 5 March 2019 were calculated per 100,000 with binomial exact 95% confidence intervals, for each nation in the UK. The at-risk population prevalence was stratified by sex and age, categorised in 5-year bands except 2–9 years and 90–99 years. Prevalence estimates for the at-risk population and each condition were stratified by age and region, separately and in combination. Prevalence values with fewer than five individuals were suppressed to preserve confidentiality.
For additional analysis estimating the size of the at-risk population including all individuals aged ≥70 years, the at-risk prevalence among individuals aged 2–69 years was age-standardised in 5-year bands, and added to the population aged ≥70 years, using mid-2019 national population estimates [22]. Comparison of prevalence in 2014 to 2019 was stratified by region to account for the change in regional representation of the dataset over time. The point prevalence of pregnancy and underlying health conditions was estimated among women aged 11–49 years on 5 March 2014. Prevalence estimates with and without linked secondary care records were compared among individuals at practices in England which had consented to data linkage.
Sensitivity analyses
CPRD GOLD was nationally representative by age and sex in March 2011 [17]. To update this assessment, the 2019 study population was compared to mid-2019 national population estimates, and 2019 at-risk prevalence estimates directly age-standardised in five-year bands using mid-2019 population estimates for each nation [22].
The main analysis included individuals eligible for any period of time between 1 January and 5 March 2019. Individuals who left CPRD between 1 January and 5 March would not subsequently have had new diagnoses recorded, which could underestimate point prevalence on 5 March. As a sensitivity analysis, at-risk prevalence was estimated with the study population restricted to individuals who were still registered in CPRD on 5 March 2019.
All analysis was conducted using STATA 16 MP.