Performance of existing and novel surveillance case definitions for COVID-19 in household contacts of PCR-confirmed COVID-19

Background Optimized symptom-based COVID-19 case definitions that guide public health surveillance and individual patient management in the community may assist pandemic control. Methods We assessed diagnostic performance of existing cases definitions (e.g. influenza-like illness, COVID-like illness) using symptoms reported from 185 household contacts to a PCR-confirmed case of COVID-19 in Wisconsin and Utah, United States. We stratified analyses between adults and children. We also constructed novel case definitions for comparison. Results Existing COVID-19 case definitions generally showed high sensitivity (86–96%) but low positive predictive value (PPV) (36–49%; F-1 score 52–63) in this community cohort. Top performing novel symptom combinations included taste or smell dysfunction and improved the balance of sensitivity and PPV (F-1 score 78–80). Performance indicators were generally lower for children (< 18 years of age). Conclusions Existing COVID-19 case definitions appropriately screened in household contacts with COVID-19. Novel symptom combinations incorporating taste or smell dysfunction as a primary component improved accuracy. Case definitions tailored for children versus adults should be further explored. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11683-y.

COVID-19, ranging from none or mild indistinct symptoms to invasive neurological disease and fulminant respiratory failure [3][4][5][6][7]. As is common in the early response phases to novel emerging pathogens, there is ongoing need to reassess and refine surveillance case definitions for COVID-19 based on new information. Changes to case definitions affect interpretation of surveillance data, as was demonstrated by substantially different prevalence estimates when China broadened the COVID-19 case definition early in its epidemic response [2].
A few studies have demonstrated the predictive value of symptom profiles in healthcare workers [4,8,9] and other populations potentially not necessarily representative of the general public [5,10]. These studies are subject to other limitations, too. Some applied predictive models that included serum biomarkers and imaging [11,12]. Obtaining this information may limit real-world capture of people with mild-to-moderate SARS-CoV-2 infection and may delay public health intervention. Further, few studies to date have examined symptom combinations exclusively. Respiratory pathogens routinely behave differently in children and adults, and this appears to be true for COVID-19 as well [13]. For example, an assessment of ambulatory case surveillance definitions for influenza demonstrated lower sensitivity among children less than 5 years of age [14]. Similar analyses across age strata are lacking for COVID-19. Reliable, age-stratified syndromic surveillance definitions would likely aid public health officials to scale up community contact tracing and develop protocols to safely operate various congregate venues, such as schools and workplaces, should unlimited, timely diagnostic testing be unavailable.
Dedicated symptom-based surveillance systems have been developed to track COVID-19 cases. These include the U.S. Council of State and Territorial Epidemiologists (CSTE) original (CSTE combination 1; released April 5, 2020) and revised (CSTE combination 2; released August 7, 2020) clinical criteria for reporting SARS-CoV-2 infection, and the original CDC COVID-19-like illness (CLI) definition (Table 1). Similarly, the Centers for Disease Control and Prevention (CDC) maintains a list of symptoms for priority SARS-CoV-2 testing. These COVID-19 case definitions and the priority testing symptom list are intended to capture as many persons with COVID-19 as possible with confirmatory testing. Finally, longstanding respiratory virus surveillance networks established to monitor influenza-like illnesses (ILI) and acute respiratory infection (ARI), which is used for community-based syndromic surveillance of respiratory syncytial virus by the World Health Organization (WHO), may be plausibly adaptable platforms for monitoring COVID-19. The performance characteristics and utility of these syndromic surveillance platforms for COVID- 19 have not been well defined [5].
We aimed to describe the diagnostic performance of two existing case definitions for COVID-19, the CDC COVID-19 symptom list, and two longstanding viral respiratory disease surveillance definitions among persons with confirmed SARS-CoV-2 exposure, stratified between adults and children. We also aimed to derive novel, practical symptom combinations in the same population. We interpreted the results primarily within the framework of two core public health surveillance functions: 1) symptom-based screening of individuals to guide SARS-CoV-2 diagnostic testing, contact tracing, and community-based isolation and quarantine, and 2) estimating disease frequency in persons with documented SARS-CoV-2 exposure. For symptom screening, we considered the merits of novel combinations when unlimited, timely diagnostic testing is unavailable.

Study design and data collection
CDC collaborated with state and local health departments in the Milwaukee, Wisconsin and Salt Lake City, Utah metropolitan areas in the United States to identify and enroll a convenience sample of people with laboratory-confirmed SARS-CoV-2 infection and their household contacts from March 22 to April 22, 2020. Ours are secondary analyses of this household transmission investigation whose questionnaire and methods were previously published in detail [15,16]. This activity was reviewed by CDC and was conducted consistent with applicable federal law and CDC policy. See e.g., 45 Code of Federal Regulations (CFR) part 46, 21 CFR part 56; 42 United States Code (USC) §241(d); 5 USC §552a; 44 USC §3501 et seq.
We administered questionnaires to household contacts to assess the presence of 15 symptoms during the 14 days prior to or at enrollment (day 0). Additionally, participants completed a daily symptom diary during days 1-14 after enrollment. We collected serum and upper respiratory specimens (i.e., both nasopharyngeal [NP] and anterior nares swabs) on day 0 and day 14. We additionally collected NP swabs at any interim date if any household contact newly developed or had worsening of any one of 15 symptoms consistent with COVID-19: nasal congestion or runny nose, sore throat, cough, chest pain, shortness of breath, discomfort while breathing, wheezing, headache, new loss of taste or smell, fever/ chills, fatigue, muscle aches, diarrhea (≥3 loose stools per day), abdominal pain, or nausea/vomiting. The Milwaukee Health Department and Utah Public Health Laboratories tested the swabs using the CDC real-time Reverse Transcriptase Polymerase Chain Reaction (RT-PCR) assay for SARS-CoV-2 [17], and CDC tested sera using a CDC-developed SARS-CoV-2 enzyme-linked immunosorbent assay (ELISA) [18].

Definitions
We defined a household contact to be a COVID-19 case if they had at least one specimen test positive for SARS-CoV-2 by RT-PCR. We classified persons < 18 years of age as children, and persons ≥18 years of age as adults. We combined all symptoms recorded at any time prior to enrollment (and after the index case's symptom onset date) through the end of the 14-day observation period. We assessed individual symptoms, existing symptom combinations, and newly constructed symptom combinations for their association with SARS-CoV-2 test result by RT-PCR (Table 1). We asked enrollees to state whether they experienced any loss of taste and, separately, smell during the specified time period. For enrollees who responded yes to this question, we then asked whether the loss was partial or complete. We defined loss and/or dysfunction of taste or smell to include any Table 1 Existing COVID-19 case definitions, respiratory illness surveillance case definitions, and derived compound symptom  combinations assessed for diagnostic performance in a community cohort of 185 individuals with household COVID-19 exposure in  Utah and Wisconsin Compound symptom combinations were derived from all symptoms recorded at any time prior to enrollment (and after the index case's symptom onset date) through the end of the 14-day observation period level of loss, whether partial or complete. For ARI, we interpreted coryza as runny nose or nasal congestion.

Analytic methods
We excluded household contacts from the main analysis if not present at enrollment or not completing the study procedures. Our analysis of combinations predictive of COVID-19 included all 15 symptoms surveyed. We formally described the diagnostic performance of each individual symptom, existing COVID-19 case definitions, respiratory illness case definitions, and newly constructed symptom combinations ( Table 1). The goal of assessing symptom combinations was to accurately divide the population into two groups: those who tested positive for SARS-CoV-2 and those who tested negative. To determine how well each definition would estimate prevalence in a syndromic surveillance system, we also calculated the difference in the number of positive symptom screens (i.e., TP + FP) from the actual number of contacts who tested positive by RT-PCR. We assessed combinations across all ages and in children and adults separately, and we reported all measures on the percentage scale. To assess variability in each performance measure, we constructed biascorrected and accelerated bootstrap confidence intervals [19] over 10,000 pseudosamples constructed by resampling households with replacement. We reported 95% confidence intervals with two exceptions. For measures estimated at 100% in the observed sample, we omitted confidence intervals, because the pseudosamples could not exhibit any variability. For the difference in specificity and sensitivity between adults and children, we reported 97.5% confidence intervals (a Bonferroni correction) to allow for a 95% joint confidence level regarding the differences in each pair. We adapted innovative methods previously applied in the low-resource context to derive a parsimonious symptom combination to prioritize diagnostic testing for tuberculosis [20]. We chose this approach to be as comprehensive as practical for COVID-19 in that it systematically assessed nearly every conceivable combination of symptoms. First, we searched over 245,000 combinations of between one and 15 symptoms (i.e., simple combinations of the form "at least m symptoms present out of n symptoms considered"). We gave greater weight to combinations with high F-1 score or high Youden's index. We then conducted an exhaustive search using pairs of these "m-of-n" combinations (i.e., compound combinations) to allow for more nuanced combinations. We limited this second search to single combinations of no more than five symptoms, such that the number of total symptoms evaluated for a compound combination was never more than ten. We allowed each pair of combinations to be joined by the logical operators [AND] and [OR], yielding approximately 73 million unique combination pairs. After the search, we selected four combination pairs to include in the primary analysis based on diagnostic performance and parsimony. We measured diagnostic performance by F-1 score (higher being better). We measured parsimony by the total number of symptoms evaluated (fewer being better) ( Table 1).
We performed all calculations in R 4.0.0 (R Core Team), Python 3.7 (Python Software Foundation), or both. To allow for parallel processing, the exhaustive combinatorial search and bootstrap confidence intervals were implemented on a scientific workstation with 24 logical cores and 64 GB of RAM. De-identified data and analytic scripts in R and Python are publicly available through a GitHub repository: https://github.com/ scotthlee/covid-casedefs.

Adult-child differences in discriminatory performance
The accuracy of symptom profiles for defining RT-PCR confirmed COVID-19 differed by age (Table 3, Fig. 2). Overall, existing case definitions were less sensitive in children compared to adults. One exception, the CDC symptom list for priority testing (Table 1), captured all COVID-19 cases regardless of age. The existing case definitions were more specific in children, but the greater specificity was statistically significant for CSTE combination 1 only. Individual symptoms showed a similar pattern of lower sensitivity among children, notably taste/ smell dysfunction. Sore throat was more sensitive in children, and fever/chills and nausea were similar regardless of age group. We observed a similar pattern of increased specificity for most derived symptom combinations in children (Table 3, Fig. 2). Cough was the sole symptom where the difference in both sensitivity and specificity was statistically significant. For both children and adults, the CLI case definition provided the greatest balance between both sensitivity and specificity (Youden's Index 53%; 95% CI 8-80% vs. 52%; 95% CI 36-66%, respectively) and harmonization of sensitivity with PPV (F-1 61%; 95% CI 26-83% vs. 63%; 95% CI 49-76%, respectively) ( Table 2). CLI also most accurately predicted overall prevalence amongst children (percent difference from true prevalence 36%; 95% CI -17-157) ( Table 3).

Discussion
Existing case surveillance definitions for COVID-19, as shown in Table 1, were generally sensitive in our study conducted among household contacts of infected persons, a population with proven SARS-CoV-2 exposure. However, they tended to have low specificity and poorly estimated disease prevalence. By systematically screening novel definitions that optimized sensitivity, specificity, and PPV, we improved community prevalence estimation and overall accuracy of individual screening, which Derived compound symptom combinations r  This collaborative effort between CDC, state and local health departments, and healthcare providers has been tracking patients with ILI since the 1997-1998 influenza season r Compound symptom combinations were derived from all symptoms recorded at any time prior to enrollment (and after the index case's symptom onset date) through the end of the 14-day observation period s Derived compound combination 1: Taste and/or smell dysfunction, OR one of the following: shortness of breath, myalgia, or fever or chills t Derived compound combination 2: Taste and/or smell dysfunction or discomfort breathing, OR at least two of the following: shortness of breath, wheezing, or fever/chills u Derived compound combination 3: Taste and/or smell dysfunction, OR at least two of the following: shortness of breath, wheezing, discomfort breathing, or fever/chills v Derived compound combination 4: Taste and/or smell dysfunction, OR shortness of breath and fever/chills w NA not applicable could be useful if diagnostic testing is limited. In particular, we affirmed loss or dysfunction of taste or smell as a uniquely discerning characteristic central to constructing an effective, concise case surveillance definition when applied across all age groups (i.e., derived compound combination 3). An appropriate discriminatory balance between sensitivity and specificity for a newly emerging pathogen depends on the objectives of the surveillance activity [21]. Highly sensitive case definitions capture a larger proportion of true COVID-19 cases, which is ideal when diagnostic testing is widely available and results are timely. Highly sensitive definitions, however, generally rule in a larger number of non-cases (i.e., FP symptom screens) [22]. In addition to testing resources, the public health system's tolerance for false-positive screens is, of course, dependent on human resources. This is especially apparent when intensive interventions involve extensive contact tracing, isolation and quarantine. At high community COVID-19 prevalence, these intensive mitigation efforts may benefit from evidence-based prioritization. By example, CSTE combination 2 had a FP symptom screening rate (77/136; 57%) eight times that for derived compound combination 3 (10/136; 7%) in our cohort. At the population level, such differences could expose shortcomings in resources for core interventions, such as universal contact tracing. The COVID-19 response has repeatedly been strained in these requisite areas [23][24][25]. Novel symptom screening criteria that more tightly couple sensitivity and specificity (i.e., diagnostic COVID-19-like illness (CLI) definition was used to guide early diagnostic testing strategies from 17 January 2020-08 March 2020 (https:// emergency.cdc.gov/han/han00426.asp). † † ILI=Influenza-like illness (ILI) outpatient visit information collected through the U.S. Outpatient Influenzalike Illness Surveillance Network (ILINet) (https://www.cdc.gov/flu/weekly/overview.htm#anchor_1539281266932). This collaborative effort between CDC, state and local health departments, and healthcare providers has been tracking patients with ILI since the 1997-1998 influenza season. Derived compound combination 1: Taste and/or smell dysfunction, OR one of the following: shortness of breath, myalgia, or fever or chills. Derived compound combination 2: Taste and/or smell dysfunction or discomfort breathing, OR at least two of the following: shortness of breath, wheezing, or fever/chills. Derived compound combination 3: Taste and/or smell dysfunction, OR at least two of the following: shortness of breath, wheezing, discomfort breathing, or fever/chills. Derived compound combination 4: Taste and/or smell dysfunction, OR shortness of breath and fever/chills. Points closest to the upper left corner represent those with the highest sensitivity and specificity values   accuracy), such as the derived compound combination 3, could help to prioritize interventions when strategically deployed. This principle may also apply when evaluating novel vaccines or therapeutics in large clinical trials involving thousands of participants, where feasibility constraints often dictate the use of symptom-prioritized testing to confirm outcomes. Still, highly sensitive symptom rules, such as CSTE combination 2, are preferred for COVID-19 when resources are unlimited. For using syndromic surveillance systems to estimate community burden, the highly sensitive existing case definitions overestimated true burden. Conversely, highly specific case definitions, such as ILI, may detect changes in disease trends over time but underestimate true burden [21]. ILI underestimated disease prevalence by more than 80% in this study population. Current laboratory-based surveillance grossly under-ascertains incidence [26], especially where diagnostic testing is not easily accessible or widespread. Retailoring communitybased syndromic surveillance systems already in place [27] (i.e., altering the symptoms included or applying a correction factor based on results such as ours) would more accurately reflect true burden.
For most symptoms and their combinations, overall performance, most notably sensitivity, differed between child and adult household contacts. These findings are consistent with prior observations whereby children generally show fewer and milder symptoms of COVID-19 compared with adults [28], and COVID-19 syndromes vary across ages [13]. The small number of children with COVID-19 in this cohort limits the conclusion of specific recommendations, but further examination into the utility of age-specific case definitions is warranted in considering policies for schools and other child congregate settings, and for deriving accurate burden estimates from syndromic surveillance.
While the number of individuals in this study is relatively small, our study population is well-characterized. We collected extensive symptom data, which yielded a comprehensive assessment of multiple symptom combinations. We also employed inclusion criteria that were not based on disease status or symptom status, and a reference category based on standardized laboratory testing. Nonetheless, we acknowledge this study's limitations. These analyses were not intended to produce definitive symptom combinations to be applied to the general public, however they may be used to guide the development of future candidate case definitions. One key consideration for future validation efforts is that enrollment started immediately after the precipitous decline in laboratory-confirmed influenza virus infections in the United States in mid-March 2020 [29]. Thus, our estimates of diagnostic performance may differ during the viral respiratory season. In addition, COVID-19 prevalence was higher for our study population (i.e., contacts of laboratory-confirmed household members) compared to the entire community, thereby limiting the generalizability of predictive values (although sensitivity and specificity remain unaffected by disease prevalence). Our study population was younger than the general population and the screening criteria may perform differently in older adults. We did not have enough older household contacts to permit further stratification among adults. Finally, screening criteria applied to persons seeking medical care may also perform differently, as those individuals probably tend to have more severe illness.
(See figure on previous page.) Fig. 2 Sensitivity and 100%-specificity for individual COVID-19 symptoms, existing case definitions, and derived compound symptom combinations for a community cohort of 122 adults (upper case letters) and 63 children (lower case letters) with household exposure to COVID-19 in Utah and Wisconsin, United States, March-May 2020. Specificity is the probability of testing negative when disease is absent. Sensitivity is the probability of testing positive when disease is present. CDC symptom list=U.S. Centers for Disease Control and Prevention (CDC) list of symptoms that may indicate COVID-19 infection (https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html). This symptom list was last updated on 13 May 2020. ARI=World Health Organization (WHO) acute respiratory infection (ARI) definition for community-based respiratory syncytial virus (RSV) surveillance (https://www.who.int/influenza/rsv/rsv_case_definition/en/). Last updated 04 February 2020. CSTE combination 1=U.S. Council of State and Territorial Epidemiologists (CSTE) original clinical criteria for COVID-19 reporting (https://cdn.ymaws.com/ www.cste.org/resource/resmgr/2020ps/interim-20-id-01_covid-19.pdf). This interim position statement (Interim-20-ID-01) was approved on 05 April 2020 and was replaced by Interim-20-ID-02 on 07 August 2020. CSTE combination 2=U.S. Council of State and Territorial Epidemiologists (CSTE) revised clinical criteria for COVID-19 reporting (https://cdn.ymaws.com/www.cste.org/resource/resmgr/ps/positionstatement2020/interim-20-id-02_ COVID-19.pdf). This interim position statement (Interim-20-ID-02) was approved on 07 August 2020 and replaced Interim-20-ID-01. CLI=U.S. Centers for Disease Control and Prevention (CDC) COVID-19-like illness (CLI) definition was used to guide early diagnostic testing strategies from 17 January 2020-08 March 2020 (https://emergency.cdc.gov/han/han00426.asp). ILI=Influenza-like illness (ILI) outpatient visit information collected through the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet) (https://www.cdc.gov/flu/weekly/overview.htm#anchor_1539281266 932). This collaborative effort between CDC, state and local health departments, and healthcare providers has been tracking patients with ILI since the 1997-1998 influenza season. Derived compound combination 1: Taste and/or smell dysfunction, OR one of the following: shortness of breath, myalgia, or fever or chills. Derived compound combination 2: Taste and/or smell dysfunction or discomfort breathing, OR at least two of the following: shortness of breath, wheezing, or fever/chills. Derived compound combination 3: Taste and/or smell dysfunction, OR at least two of the following: shortness of breath, wheezing, discomfort breathing, or fever/chills. Derived compound combination 4: Taste and/or smell dysfunction, OR shortness of breath and fever/chills. Points closest to the upper left corner represent those with the highest sensitivity and specificity values Additionally, we showed that existing COVID-19 case definitions are highly sensitive and do well to screen in persons for testing and individual-level public health interventions like community isolation. In the first such endeavor for evaluating and deriving novel COVID-19 case surveillance definitions in a community setting among SARS-CoV-2-exposed individuals with largely mild illness, we evaluated novel symptom combinations for COVID-19 using methodology previously applied to tuberculosis in low resource settings [20]. These derived combinations and CSTE definition 2 better estimated disease burden and used taste and/or smell dysfunction as a primary component. The latter is supported by prior studies [5,[8][9][10]. Because most SARS-CoV-2 infections are mild [30] and core public health functions may need prioritization when testing and other resources are limited, case definitions that accurately determine COVID-19 status in the general public may assist continued interruption of community transmission [31]. When timely diagnostic testing is readily available, however, using less sensitive screening tools could inappropriately miss cases and lead to further community transmission.
Our study population, which includes participants enrolled independent of disease and symptom status, may better reflect the diagnostic performance in the general population than previously published research. Accurate clinical case definitions are likely to also apply to large clinical trials for candidate vaccines and therapeutics where serial confirmatory SARS-CoV-2 testing for any new symptom is impractical. It is important that our results be validated against the growing body of larger ambulatory surveillance databases in diverse communities and in other countries; in particular, our methodology should be assessed in the context of the annual influenza season, at varying community COVID-19 prevalence, and across the age spectrum. Such studies ideally can be accompanied by cost-effectiveness modeling of intervention strategies.

Conclusions
The discriminatory performance of case surveillance definitions for COVID-19 is important for implementing effective epidemic mitigation strategies. Our study illustrates the performance of case definitions in community members with household exposure to SARS-CoV-2 based solely on symptom profiles. Prior work overrepresented healthcare workers or otherwise studied non-representative populations, and they did not examine both adults and children. Our study also provides a novel framework for refining definitions. Using 15 symptoms associated with COVID-19 for all contacts regardless of disease status, we systematically evaluated the discriminatory performance of individual symptoms and previously defined case surveillance definitions in adults and children, and according to two core surveillance applications: 1) screening non-hospitalized individuals to prioritize public health interventions, and 2) estimating the number of non-hospitalized persons with COVID-19 (i.e., community-based syndromic surveillance). We also constructed novel symptom combinations that effectively performed both functions and, in this study population, improved upon widely used case surveillance definitions that may help to target interventions in the absence of unlimited laboratory diagnostic capacity. Based on our results, case surveillance definition performance may increase if developed separately for adults and children.