In the proceeding sections, we regard a flu outbreak as a period characterized by a sudden increase in the number of people with flu-like symptoms in a given location. Since flu outbreaks in the United States usually occur from December through April, we defined the "flu season" to be December 1 through April 30 of the subsequent calendar year and the "non-flu season" to be the remainder of the year.
Data collection
Data for this analysis came from DC's ERSSS. In DC, emergency department logs from nine hospitals are sent on a daily basis to the health department, where health department staff code them on the basis of chief complaint, recording the number of patients in each of the following mutually exclusive syndromic categories: death, sepsis, rash, respiratory illness, gastrointestinal illness, unspecified infection, neurological illness, and other complaints [15]. Coding is done hierarchically, in the order given, so patients with two or more complaints will be assigned the first one on the list. The data span the calendar dates of September 11, 2001 to June 19, 2006, i.e. 1,743 days. We used only part of the data, representing the time between September 11, 2001 and May 17, 2004 (the period covered by our previous research) as a test sample to determine the parameter values in our fine-tuning analysis. The remaining data were set aside until this was accomplished and the complete data set used to validate our choices, as described in more detail below.
With the exception of some of the first days of the program, daily counts that were missing were imputed using the imputation strategy described in the appendix. After imputing the missing counts in the data, we standardized the daily counts for each condition and hospital by dividing the daily count by the mean number of cases in the non-flu seasons.
Fine-tuning approach
To fine-tune the three algorithms defined below, we used a two-pronged approach that aims to characterize the performance of the algorithms in rigorous, yet practical terms. First, we utilized a simulation study to estimate receiver operating characteristic (ROC) curves and determine which parameter values provided an optimal trade-off between false positive rates and sensitivity of an algorithm. Second, we used "known" outbreaks in the DC Department of Health data to assess the timeliness and sensitivity of an algorithm when faced with detecting actual (non-simulated) outbreaks. Data available for the initial fine-tuning analysis spanned the calendar dates from September 12, 2001 to May 17, 2004, i.e. 980 days
To perform the simulation study, we first created 970 datasets from the DC data with simulated outbreaks as follows. Linearly increasing outbreaks were inserted into each dataset such that x extra cases were inserted on day one of the outbreak, 2× on day two, and 3× on day three. The datasets each had a different start date for the simulated 3-day outbreak, that is, the first dataset's outbreak began on September 12, 2001, the second's on September 13, 2001, and the 970th data set's on May 8, 2004. Since we use standardized daily counts, x was set equal to values between zero and one where x = 0.50 would correspond to inserting multiples of half the average number of daily cases in the non-flu seasons into the data.
After creating the simulated datasets, we applied the algorithms for a given set of parameter values and false positive rate (between 0.001 and 0.05) to each one. We then computed the sensitivity of the algorithm to detect the simulated outbreak by day three of the outbreak in the non-flu seasons over all the simulated data sets. It was of primary interest to understand how well the algorithms did at detecting a simulated outbreak against a "normal" background level of disease activity. No known outbreaks occurred during the "normal" background periods. For this reason we did not use data from the flu seasons, December 1 to April 30, which likely contain actual outbreaks in addition to the simulated ones, yielding biased sensitivity rates. Thus, we computed the sensitivity of the algorithm using the following formula:
Finally, we plotted the sensitivity of the algorithm to flag by day three in the non-flu seasons by the false positive rate and computed the area under the sensitivity curve (see below). The set of parameters of the algorithm that gave the curve with the maximum area was considered to be the optimal set of parameter values in the simulation study since it gave the best balance between sensitivity and false positive rates [16].
After selecting a small number of candidate values that performed well under the ROC curve approach, we proceeded to the second part of our fine-tuning analysis. In the second part of our analysis, we assessed how well the selected candidate values did at detecting previously "known" outbreaks in the DC Department of Health data set. Specifically, we assessed how well the algorithms did at flagging the beginnings of the flu outbreaks in 2002 and 2004 and at flagging gastrointestinal outbreaks that occurred in three hospitals during the winter of 2003.
We selected data from three hospitals for our fine-tuning analysis based on the number of emergency department admissions. Throughout this paper, we refer to these as hospitals S (small), M (medium), and L (large). For each, we used data for two conditions: unspecified infection and gastrointestinal.
The results of our fine-tuning analysis are presented primarily in two different graphical formats. The results of the simulation studies are presented in terms of ROC curves, as in Figures 1 and 2. For this format, the horizontal axis represents the false positive rate and the vertical axis represents the sensitivity rate. The curves display the sensitivity rate on day three of the simulated outbreak in the non-flu season that is achieved for each pre-determined false positive rate. (The non-flu season is used under the assumption that there are no actual outbreaks during this period.) The different curves correspond to different outbreak sizes or parameter values.
The application of the detection algorithms to actual data in comparison with known outbreaks is displayed in a different format, as exemplified by Figures 3 and 4. In this form, smoothed values for one or more data streams (unspecified infection cases in hospital S in Figure 3 and hospital L in Figure 4) are presented in terms of a curve. The flagging of the detection algorithms is represented by symbols plotted according to the day they flagged on the horizontal axis and the value of the test statistic (e.g. CUSUM) on that date on the vertical axis. The flagging times are also represented along the time axis with vertical lines that are color coded to the symbol. Note that the vertical axis is on a square root scale because of its variance stabilizing properties. "Winter" is used in these graphs and throughout the remainder of the paper as a shorthand for November 1 of the previous year to April 1 of the year displayed or discussed. This definition, which differs from our definition of the "flu season," was chosen simply to ensure that the figures focus on the part of the data where flu outbreaks actually began during the study period.
Statistical detection algorithms
We utilize three detection algorithms in this study. The first, the CUSUM (for "cumulative summation") algorithm, monitors the daily statistic S
i
which is defined by the recursive formula
where X
i
denotes the observed daily count on day i, μ denotes the overall mean daily count estimated from the data, and k is an off-set parameter set by the user [17, 18]. This statistic cumulates positive deviations from the average in order to detect small but persistent increases in cases. The CUSUM algorithm alarms or flags at time
= inf{i : S
i
> h} where h is computed empirically to guarantee a fixed false positive rate (specified by the user) in the non-flu seasons. In our analysis, we focus on fine-tuning the user-specified value of k. The parameter k is a result of the theoretical derivation of the univariate CUSUM as a sequential likelihood ratio test for a shift δ in the mean parameter of a normal distribution. Under standard assumptions, k is usually set to δ/2, for a one-time shift in level that must be detected quickly. In syndromic surveillance however, we typically expect outbreaks that increase in size over time, so the standard theory does not determine an optimal k. Instead, k must be determined empirically using methods such as those described below. The theory does, however, suggest that choice of k matters a great deal. If k is too small, the algorithm will detect every fluctuation in the data while if it is too large, the algorithm will not detect anything at all.
The CUSUM based on deviations from an exponentially weighted moving average, which we refer to throughout as EXPO, adds one additional step to the CUSUM algorithm described above [13]. First, the EXPO algorithm predicts the daily counts, X
i
, using an exponentially weighted moving average. Specifically, it defines
where 0 ≤ λ ≤ 1 is a user-specified parameter and represents the degree of smoothing that is to be done in the data (i.e. smaller values correspond to more smoothing). The algorithm then monitors the differences between the actual and predicted counts using the statistic S
i
which is defined by the following recursive formula
As with CUSUM, the EXPO algorithm flags at time
= inf{i : S
i
> h} where h is computed empirically to guarantee a fixed false positive rate (user-specified) in the non-flu seasons. In contrast to CUSUM, EXPO should allow for more sensitive detection when outbreaks appear against a linearly increasing background pattern. With EXPO, two parameters must be fine-tuned: k and λ. Both parameters are supplied by users in this algorithm and their selected values will have a large impact on the statistical performance of the algorithm.
Lastly, we worked to fine-tune the multivariate CUSUM algorithm, which we refer to as MV CUSUM. The MV CUSUM was developed for monitoring multiple streams of data on a daily basis (e.g., streams of data from more than one hospital or streams of data from more than one condition within a hospital) [19] and can offer greater utility when an outbreak is likely to influence the daily counts of more than one symptom group. It follows the same logic as the standard CUSUM except that now daily counts are represented by a vector X
i whose dimension is p × 1 where p represents the number of streams being analyzed together. We define
and S
i = 0 if C
i
≤ k where
and Σ
-1 is the estimated variance-covariance matrix for the p streams of data being analyzed using only daily counts from the non-flu seasons. The MV CUSUM algorithm flags at time
= inf{i : C
i
> h}. For the MV CUSUM algorithm, k is the parameter of interest for fine-tuning purposes. It is user-specified and its value will impact the statistical performance of the algorithm.
Analysis of DC influenza data
In order to evaluate the ERSSS data's ability to determine the beginning of the seasonal influenza outbreak in DC, we selected a number of 'candidate' syndromic surveillance systems within the system and compared how well each 'candidate' system did at flagging the beginning of the flu season. Additionally, we compared each one to CDC's sentinel physician data, which is based on a network of physicians reporting each week on the proportion of cases they have seen that have influenza-like illness (ILI). Our 'candidate' systems included both univariate and multivariate versions. Initially, we examined how well each of the eight syndrome categories did alone at flagging the beginning of the flu season. Then, based on the initial performance, we focused in on the ability of unspecified infection and respiratory categories to detect the start of the flu season both within a single hospital and taken together across hospitals.
Finally, we examined how well CNMC did versus all the others when using just unspecified infection, just respiratory, and both categories together. For the CDC's sentinel physician data, data is available only on a national and regional basis, so we choose to use the South Atlantic region, which includes the District of Columbia (CDC, various years) in our comparisons. Also, all the comparisons in the influenza analysis used fine-tuned versions of the algorithms based on the results from the fine-tuning analysis.