The usual syndromic surveillance methods are supervised and based on statistical tools. Herein, we described a novel method that could be used when the supervised approach is not applicable. That situation occurs when we are faced with the detection of "unexpected" events, which, by definition, are of major interest for epidemiological alert. Indeed, our primary goal is to help recognize, as early as possible, totally unexpected epidemiological patterns. The detection-triggering signal can be the mere increase of an isolated diagnosis code. Pertinently, in this case, a regression method with a threshold would have performed better only under the very restrictive condition that this code would have been identified in advance. However, the signal can also be an unusual association of different color patches on the monitor, which appear to be novel to the observer, and trigger an in depth epidemiological investigation.
Our proposed model is similar to what already happens in an air-traffic-control room: most of the routine tasks are now automated and the attention of the human observers is now focused on "unexpected" events. Likewise, we propose relying on the classical supervised methods for the usual situations that happen regularly (e.g., seasonal flu epidemics), and we seek to improve our detection of the unexpected epidemiological events that are extremely critical from a public health perspective, precisely because they are unexpected.
An important technical problem is the choice of the time resolution of the display on the monitor. For the French Sentinel Network, resolution time was the month: that timeframe is clearly irrelevant for prospective surveillance and was only used to show the potential of our method to recognize a very special event (i.e., the health impact of the 2003 heat wave). Similarly, for the HED, the choice of weekly resolution was only illustrative and was imposed by the numbers of data available per day, keeping in mind that several hundred cases are needed to create an informative image. In a real-world application, the choice of the temporal resolution would depend on the nature of the class of events to be identified: hourly resolution or, at worst, daily resolution would be desirable to recognize a terrorist attack-associated disease. The temporal resolution chosen also reflects the spatial resolution, with the number of cases observed indeed being a decreasing function of both spatial and temporal resolutions.
For example, the HED data we used in our example was collected in real time. The hospital that provided those data has ~150 consultations per day. Using a surveillance system based on the network of all Paris region public hospitals (Assistance Publique-Hôpitaux de Paris), which collects real-time data on 4000 patients per day (i.e., ~150/patients/hour) would, in contrast, empower a much shorter timeframe, of the order of a few hours.
Furthermore, for the method we propose, we chose to code diagnoses and symptoms with the ICPC-2 system, because it was developed precisely for primary care patients, who are the best target for surveillance of emerging diseases or bioterrorist attacks. However, the same paradigm developed herein could be used with other classification methods.
In a first test example, we showed that visual inspection of the ICPCviews obtained based on Sentinel Network GPs' transmissions during the 2003 heat wave in France would have likely raised suspicion that something unusual was occurring at that time. Indeed, in light of the public health and political scandal that ensued, it is highly rewarding that the images generated with our model heralded the high morbidity and mortality (later documented) that passed unnoticed. At the time of the event, the only public health warnings came from newspapers and funeral parlors, not from the health information systems, which were therefore far from the ideal real-time systems we described above. Imagine that the wall of monitors would have generated patterns similar to those seen in Figure 2, derived from data collected throughout the country. We are convinced that the trained "epidemiology watchers" would have detected the unexpected patterns and would have triggered the investigations that were so sorely lacking.
The second example we used was the detection of a flu-like outbreak. Flu-like symptoms are observed at the onset of many diseases, during bioterrorist attacks (e.g., smallpox, plague, anthrax), and for emerging diseases (e.g., severe acute respiratory syndrome, Chikungunya, flu pandemic,...). Numerous supervised techniques proved successful at recognizing seasonal influenza outbreaks, and the goal of our technique is not to compete with those methods in this situation.
Now, imagine an outbreak of influenza-like syndromes occurring in August, or the onset of a new disease heralded by symptoms, like epistaxis or purpura; the supervised methods would be, by definition, unsuited to detect them, while an unsupervised technique, like the one we proposed here, could work. Finally, the method is designed to have the highest sensitivity possible, in order to detect rare, unusual and unexpected signals. To achieve good positive-predictive value would require, in addition, a "back room", where human experts would validate the signals based on appropriate field epidemiological investigations.
One caveat of the method is that it relies, by definition, on human observers. Hence, its effectiveness will depend upon the quality of these observers and their training. The system's quality and that of the epidemiology watcher could be measured with a research protocol based on simulated datasets. This approach has been successfully used in epidemiological surveillance to test new algorithms [28]. Simulated data sets could be generated by adding a given number of codes of interest (e.g., those compatible with an anthrax attack) to an existing database (e.g., the present HED database). Epidemiology watchers would then be shown the successive monitors displaying the evolution of the images within the graphic reference frame, and asked to indicate whether and when they could identify an outbreak. Such a design would allow easy computation the sensitivity and specificity of the system (as a function of the number of simulated codes added to the database). Standard statistical techniques would also allow assessment of intra- and interobserver variabilities.