A methodological framework for the evaluation of syndromic surveillance systems: a case study of England

Background Syndromic surveillance complements traditional public health surveillance by collecting and analysing health indicators in near real time. The rationale of syndromic surveillance is that it may detect health threats faster than traditional surveillance systems permitting more timely, and hence potentially more effective public health action. The effectiveness of syndromic surveillance largely relies on the methods used to detect aberrations. Very few studies have evaluated the performance of syndromic surveillance systems and consequently little is known about the types of events that such systems can and cannot detect. Methods We introduce a framework for the evaluation of syndromic surveillance systems that can be used in any setting based upon the use of simulated scenarios. For a range of scenarios this allows the time and probability of detection to be determined and uncertainty is fully incorporated. In addition, we demonstrate how such a framework can model the benefits of increases in the number of centres reporting syndromic data and also determine the minimum size of outbreaks that can or cannot be detected. Here, we demonstrate its utility using simulations of national influenza outbreaks and localised outbreaks of cryptosporidiosis. Results Influenza outbreaks are consistently detected with larger outbreaks being detected in a more timely manner. Small cryptosporidiosis outbreaks (<1000 symptomatic individuals) are unlikely to be detected. We also demonstrate the advantages of having multiple syndromic data streams (e.g. emergency attendance data, telephone helpline data, general practice consultation data) as different streams are able to detect different outbreak types with different efficacy (e.g. emergency attendance data are useful for the detection of pandemic influenza but not for outbreaks of cryptosporidiosis). We also highlight that for any one disease, the utility of data streams may vary geographically, and that the detection ability of syndromic surveillance varies seasonally (e.g. an influenza outbreak starting in July is detected sooner than one starting later in the year). We argue that our framework constitutes a useful tool for public health emergency preparedness in multiple settings. Conclusions The proposed framework allows the exhaustive evaluation of any syndromic surveillance system and constitutes a useful tool for emergency preparedness and response. Electronic supplementary material The online version of this article (10.1186/s12889-018-5422-9) contains supplementary material, which is available to authorized users.

imported cases. Susceptible individuals (S) are then infected at a rate β (the per capita rate 23 at which two individuals come into e ective contact [1]) after contact with an infectious person 24 Figure S1: Schematic representation of the compartmental models used in the study. The model in the top (A) was used to simulate outbreaks of pandemic influenza. The model in the bottom (B) was used to simulate outbreaks of cryptosporidiosis.
irrespectively of whether that infectious person is symptomatic (I) or asymptomatic (A). Once 25 infected, individuals become latent (L) carriers of the disease. In latent carriers (L), the disease 26 incubates for a period of σ days. A proportion (p) of latent individuals becomes infectious and 27 symptomatic (I) at a rate of 1/σ per day (1/length of the incubation period). The remainder 28 (1-p) become infectious but asymptomatic (A) also at a rate of 1/σ per day. Asymptomatic 29 individuals have their infectivity reduced by a factor (k). Infectious individuals both symptomatic 30 and asymptomatic, recover (R) at an average rate of 1/γ per day, where γ represents the length 31 of the infectious period (in days). The equations used for pandemic influenza model were as 32 follows [2]: where: Modelling assumptions 52 The proposed models have the following shared assumptions: 53 1. There are no changes in the population size. We make this assumption because it is 54 unlikely to experience significant changes in population size over the short period of the 55 simulations.   Figure S1).  4. There is no shedding of oocysts from the infectious people into the drinking water system. 79 5. There are no secondary infections due to person-person contact. 80 6. All oocysts in the water source are viable and infective.  studies and expert knowledge [3,9,10,11,12]. A Poisson-distributed random number of people 90 with mean λ was assumed to get exposed each day to the contaminated water source.   Figure S1 shows Several of the simulated Cryptosporidium spp. outbreaks shown on Figure S2 were considerably 106 larger than the historical outbreak whilst most of the simulated pandemic influenza outbreaks 107 were smaller than their corresponding historical data. Figure S2 however, demonstrates that once

114
The detection algorithm used as an exemplar in this paper is RAMMIE, and full details of this 115 model are provided elsewhere [15]. Every day RAMMIE analyses more than 12,000 separate time 116 series. The key output from RAMMIE are predictions of the mean number of system-specific and 117 indicator-specific syndromic counts (henceforth baseline data), and their corresponding alarm 118 thresholds (around 99% prediction intervals). These thresholds are used in the operational public 119 health system as a very conservative estimate of potentially unusual activity. In considering the 120 alarm thresholds generated by RAMMIE potential autocorrelation in the residuals were explored. 121 Figure S1 presents autocorrelation plots for the 14 raw data streams input into RAMMIE which 122 were used to generate the baselines and alarm thresholds for this paper.
123 Figure S4: Partial autocorrelation plots for 14 RAMMIE-derived residuals from the datasets used to generate the baseline data and their corresponding alarm thresholds.
Overall, the mean temporal autocorrelation at 1 day lag from these 14 data streams is moderate at 124 around 0.3. However, there is variation between data stream, and lower autocorrelations at day 125 1 are apparent in the data streams for Cryptosporidium as opposed to influenza-like illness. 126 There is also evidence that for NHS111 the autocorrelations are more short-lived (I.e. mostly on 127 day 1). For EDSS influenza-like illness the autocorrelations spread a longer time span. 128 Automated detection systems such as RAMMIE need to be necessarily conservative (avoiding false 129 negatives). The figures above indicate some temporal autocorrelation in the RAMMIE-derived 130 residuals, and hence the number of statistical alarms will be greater than if this autocorrelation 131 did not exist. This conservative approach is logical within an operational syndromic surveillance 132 system. This contrasts to a more traditional epidemiological study where a significant result holds 133 much greater prominence and most e ort goes into reducing false positives. In an operational 134 system "alarms" are only the first step in a long risk assessment process [16] through which only 135 around 1 in a 1000 will result in public health action. This is the point at which reducing false 136 positives is emphasized. There are other practical reasons why autocorrelation is di cult to take 137 into account. An operational syndromic surveillance system needs to predict future activity and 138 prediction intervals weeks ahead. Incorporating an autocorrelation term into models is hence 139 challenging as the future activity is unknown.