Exploiting routinely collected severe case data to monitor and predict influenza outbreaks

Background Influenza remains a significant burden on health systems. Effective responses rely on the timely understanding of the magnitude and the evolution of an outbreak. For monitoring purposes, data on severe cases of influenza in England are reported weekly to Public Health England. These data are both readily available and have the potential to provide valuable information to estimate and predict the key transmission features of seasonal and pandemic influenza. Methods We propose an epidemic model that links the underlying unobserved influenza transmission process to data on severe influenza cases. Within a Bayesian framework, we infer retrospectively the parameters of the epidemic model for each seasonal outbreak from 2012 to 2015, including: the effective reproduction number; the initial susceptibility; the probability of admission to intensive care given infection; and the effect of school closure on transmission. The model is also implemented in real time to assess whether early forecasting of the number of admissions to intensive care is possible. Results Our model of admissions data allows reconstruction of the underlying transmission dynamics revealing: increased transmission during the season 2013/14 and a noticeable effect of the Christmas school holiday on disease spread during seasons 2012/13 and 2014/15. When information on the initial immunity of the population is available, forecasts of the number of admissions to intensive care can be substantially improved. Conclusion Readily available severe case data can be effectively used to estimate epidemiological characteristics and to predict the evolution of an epidemic, crucially allowing real-time monitoring of the transmission and severity of the outbreak. Electronic supplementary material The online version of this article (10.1186/s12889-018-5671-7) contains supplementary material, which is available to authorized users.

are crucial to anticipate demands on health care facilities (e.g. number of beds in hospital) for each season. These timely predictions are even more crucial to inform prompt targeted responses in the event of a new emerging strain with the potential to cause a pandemic [7].
Epidemic models are increasingly used to understand the effect of particular interventions including: vaccination policies [8]; school closures to reduce transmission in a pandemic [9][10][11]; reinforced use of antiviral drugs [12]; or changes in hospital management policies.
These models are generally applied to data, such as General Practitioner (GP) consultations for influenza-like illness (ILI) [8,13] or health-related online queries [14], which are only loosely related to the actual burden and are characterized by highly volatile noise.
By contrast, more specific timely data on a sample of confirmed cases (e.g. confirmed influenza hospitalizations) might be collected routinely by national health systems. An example of these data is the UK Severe Influenza Surveillance System (USISS) [15] that records counts of the weekly Intensive Care Unit (ICU) and High Dependence Unit (HDU) admissions and deaths with confirmed influenza in all hospital trusts in England. Recently, and in the context of a pandemic, some attention has been paid to estimating and predicting pandemic transmission from routinely collected confirmed-case data [16]. This has entailed the development of a very complicated model which is difficult to use in a seasonal monitoring setting (when less effort is placed on data collection) with a prediction goal. Here we explore a much simpler model to be applied to seasonal influenza, and possibly during a pandemic, relying only on simpler data on severe cases alone, which are timely available. We therefore investigate if data collected through USISS can characterise both seasonal and pandemic epidemics, aiming to achieve both the estimation and the prediction goal.
We formulate an epidemic model that links the available USISS data to the underlying unobserved dynamics of influenza in the UK. The model parameters are inferred using data from the seasonal epidemics in 2012-2015, to obtain nation-level estimates of transmission, as measured by R n , the average number of new cases generated by an infectious individual in a partially immune population, and severity, as measured by the probability of ICU admission given infection.
Additionally, to assess the predictive power of the model, we perform analyses at different dates within each season. Finally, we study what would happen in the event of a pandemic, when the USISS surveillance scheme would be upgraded to collect more information.

Data
Following the 2009 pandemic, the World Health Organization (WHO) declared the beginning of a post-pandemic phase [17], encouraging national public health agencies to establish hospital-based surveillance systems to monitor the epidemiology of severe influenza. In response to these guidelines, and to understand the baseline epidemiology of severe influenza, the UK developed a surveillance system to monitor severe cases of influenza, the USISS [18,19]. After a pilot phase in 2010/11, USISS has run for each influenza season, providing data on laboratoryconfirmed ICU/HDU influenza cases and on laboratoryconfirmed hospitalized cases.
According to the USISS protocol [18], all National Health Service (NHS) trusts report the weekly number of laboratory-confirmed influenza cases admitted to ICU/HDU and the number of confirmed influenza deaths in ICU/HDU via a web tool. An ICU/HDU case is defined as a person who is admitted to ICU/HDU and has a laboratory-confirmed influenza A (including H1, H3 or novel) or B infection.
USISS runs annually from week 40 to week 20 of the following year but, in the event of a pandemic, it can be activated out of this window and will collect the same data at all levels of care, not only ICU/HDU. Data are available by age group and influenza type/subtype. However, when stratified by both, as well as week, many zero counts are observed. We therefore consider the total ICU/HDU admissions by week only (Fig. 1). Each season between 2012 and 2015 is shown, with each epidemic varying substantially across seasons. In the 2012/13 season, mainly characterized by Influenza B and Influenza A(H3N2) outbreaks, the number of admissions peaks early, maintaining this plateau for several months [20]. In 2013/14, when the predominant strain was A(H1N1), the time series displays a smoother increase, a well localized peak and a subsequent regular decrease [21]. Lastly, in 2014/15, the number of ICU admissions peaks earlier and has a dramatic drop at the beginning of the new year, which is followed by a smaller wave resulting in a time series characterized by a double peak. During this season, Influenza A(H3N2) was the predominant virus circulating and the total number of ICU admissions was higher; this strain is well-known to lead to more severe outcomes, particularly in the elderly [22].

Additional sources of information
In addition to the mandatory scheme, a subgroup of NHS trusts in England is recruited every year to participate in the USISS sentinel scheme [19,23], which reports weekly numbers of laboratoryconfirmed influenza cases hospitalised at all levels of These data provide useful information on the process between influenza infection and ICU admission (e.g. the time elapsing from symptom onset to ICU admission). Further information on this process (e.g. proportion of symptomatic cases) can be found in the existing literature about the incubation period of influenza [24] and the hospitalization fatality rate [25].

Model
We used an epidemic model ( Fig. 2) to describe the spread of influenza in England [26]. We assumed that the population changes according to a deterministic model in continuous time. Time is measured in days and denoted by t ≥ 0. The population is divided according to health status into four compartments: susceptible (S), exposed (E), infectious (I) and removed (R). The E and I compartment are further divided into two (E 1 , E 2 and I 1 , I 2 , respectively) so that the waiting times in the E and I states are distributed according to gamma rather than exponential distributions [27]. In the formulas below, the letters S, E 1 , E 2 , I 1 , I 2 , R denote the number of people in each compartment. The total size of the population is fixed over every season and denoted by N. The change of compartment is determined by the transition rates: λ(t), σ and γ explained below.
The infection rate λ(t) is proportional to the proportion of people in the infectious compartment at t, I 1 (t)+I 2 (t) N and a time varying transmission rate β(t): is a function of time and it allows for a scaling factor κ ∈ (0, 2] that expresses the change due to school closure applied to the transmission rate during school opening β 0 [10] as reported in Eq. 2.
The transition rates σ and γ are related to the mean latent period, d L , and the mean infectious period, d I , by: The system of differential equations that defines the epidemic model is reported in Eq. 4.
Here we have assumed homogeneous mixing among contacts (i.e. people are all equally likely to meet, irrespective of their age class and residence, for example).
This transmission model is linked to the data on ICU admissions through an observational model that defines the time elapsing from infection to ICU admission and the probability of ICU admission conditional on infection.
Denote with f ICU|I (w) the probability that w weeks elapse from infection to ICU admission, and with p ICU the probability of ICU admission given infection. We can link μ w , the average number of ICU admissions during week w, to the weekly new infections in the previous weeks via a convolution: where I w = (S(w − 7) − S(w)) · N is the count of the new infections during week w.
To formulate the likelihood of the data, we assumed that the observed number of ICU admissions is the realisation of a Negative Binomial random variable centred on μ w with over dispersion parameter η: i.e ICU w has density function: with r w = μ w η−1 . The Additional file 1 contains the full specification of the transmission model, its re-parametrization and full derivation of f ICU|I (w).

Parameter estimation
To define the epidemic we need to estimate or set both the transitions rate parameters (i.e. β, κ, σ , γ ) and the initial state of the epidemic (i.e. The epidemic model can be re-parametrized [27] and a number of quantities may be defined, including: π, the initial proportion of non-immune people; I tot (0) = (I 1 (0) + I 2 (0)), the total number of infectious people at t = 0; the basic reproduction number R 0 that is the average number of successful transmissions per infectious person in a fully susceptible population; and the effective reproduction number R n that is the average number of successful transmissions per infectious person in a partially susceptible population. All these parameters are useful under a health-policy perspective.
The parameters σ and γ are assumed known from previous studies [13,24], as they can be inferred only with detailed information at the individual level. Likewise, the population size N is assumed known and fixed to the values estimated by the Office of National Statistics (ONS) [28].
We used a Bayesian approach to draw inference on the other parameters. Bayesian inference consists in summarizing prior information on a general parameter θ in a distribution π(θ) and updating it with the information deriving from a set of data x, contained in its likelihood L(θ|x), to derive the posterior distribution: We considered two scenarios. In the first one we assumed we have no prior information on the values of the parameters except for lower and upper bounds, hence the prior distributions on all the parameters are noninformative (see Additional file 1). Table 1 lists the lower and upper limits of some transformations of the parameters and the values assumed known in this scenario. In the second scenario we used sero-prevalence data from the 2010/11 season [29] to formulate a prior distribution for the initial susceptibility π. The use of seroprevalence data to describe the immunity of a population could be debatable, since the results may be extendible only to seasons with similar predominant strains circulating. Here, sero-samples were taken during an H1 predominant season: this sub-type was prevalent also in the 2012/13 season, but not in 2014/15. However, combining this prior with the data allows us to test how much prior knowledge is needed to overcome the lack of information about susceptibility from the data. We also derived an informative prior distribution on p ICU by combining estimates of the probability of hospitalization given infection from a previous severity study [25] with estimates of the probability of ICU/HDU admission given hospitalization from the aggregate data of the USISS sentinel scheme. Table 2 lists the prior distributions of the two parameters that change in the informative scenario. The remaining parameters are again assumed to be uniformly distributed.

Analyses
For both the prior settings we performed two types of analysis: firstly we considered all the data reported in Fig. 1 and we analysed them retrospectively. Secondly, to assess the predictive ability of our model, we performed estimation and forecasting assuming only an initial portion of the data are available. We used the data up to week w as a training dataset to estimate the parameters. Then we predicted the evolution of the epidemic after week w, based on the estimates from the training dataset. We tested the following prediction time points: w = 3, 8, 13, and 18 from the beginning of the new year.
To approximate the posterior distribution, we used a Metropolis Hastings block updated sampling algorithm [30], coded using the R programming language [31]. The system of differential Eq. (4) was solved using the R package deSolve [32]. Details on the algorithm are available in the Additional file 1 and the code is available at http://www.mrc-bsu.cam.ac.uk/software/ miscellaneous-software/.

Retrospective analysis
The retrospective analysis of the data was first performed in the uninformative scenario. The resulting posterior P of ICU admission given infection p ICU ∼ LogNorm(log μ = log(0.000239), log σ = 1) [25] distributions are displayed in Fig. 3 with the posterior median and 95% Credible Intervals (CrI)s of some of the parameters reported in Table 3. Note that the posterior distribution of the basic reproduction number R 0 is almost identical to the prior. This is due to the fact that the information contained in the data is not sufficient to determine separately the values of the parameters describing both the initial immunity and the transmission rate. For the same reason the posterior distribution of the parameter π doesn't change significantly from its prior, only excluding those small values that would completely prevent an epidemic to take place. This problem is explored in detail in the Additional file 1. Data are much more informative about parameters η, p ICU and κ. The highly variable behaviour of the ICU admissions count in season 2014/15 is reflected by the over-dispersion parameter η, whose distribution is significantly higher compared to the ones estimated from the 2012/13 and 2013/14 seasons. The range of the probability of going to ICU given infection, p ICU , is always between 0.004 and 0.04%. Its median is higher in season 2014/15, in agreement with the higher severity that was detected during this influenza season [23]. The multiplicative factor κ introduced to allow for a school-closure effect is centred on 1 for season 2013/14 and centred around higher values in the remaining seasons. A possible explanation for this counter-intuitive phenomenon relies on the age distribution of the sample population. Our data have a different distribution compared to the English population [23,28], with patients over 65 being over represented and children in school years being under represented. The elderly individual perhaps are more likely to meet other potential influenza spreaders (e.g. children) during school closures, particularly over Christmas holiday. It makes sense, therefore, to observe an inverse relationship between school closure and the transmission rate, in contrast to results that might be expected from a more representative sample of the population [10]. However, this piecewise increment in transmission rate may incorporate other time-varying phenomena that affect the force of infection. The Christmas holiday often coincides with the beginning of a colder and more humid period and changes in vapour pressure, that might imply an increasing spread of influenza [33]. Lastly the posterior median of the effective reproduction number R n is equal to 1. Although the CrIs of the parameter κ included 1, the posterior probability of it being larger than 1 (Pr(κ > 1)) is substantial for two seasons. The introduction of this parameter allows the flexibility needed to represent the specific features of each season. This can be observed in the posterior predictive distribution of the weekly ICU admissions reported in Fig. 4. Specifically in season  allow precise inference both of the parameters and of the predictions. The same analysis was performed in the second scenario, i.e. allowing informative priors on the susceptibility π and on p ICU as defined in Table 2. The introduction of these prior distributions compensates for the lack of information, allowing the identification of π and improving the precision of the posterior distribution of p ICU . This affects also other parameters such as β and R 0 . However, their posterior distributions are driven by the prior distributions alone, and they do not learn from the data. In terms of fit there was no improvement. Results are reported in the Additional file 1.

Prediction
The prospective analysis of the data in the uninformative scenario resulted in very wide predictions of the future dynamics, therefore we assumed the informative priors reported in Table 2. The performance of the model at different times is plotted in Fig. 5 for each season.
Season 2013/14, despite displaying the most regular data, is the most difficult to predict: the well-defined initial growth biases the predictions towards a major outbreak. This leads to the median and the credible intervals of the posterior predictive distribution overestimating the data until mid-march (week 13 from the beginning of the year). For the other two seasons, the median predicted weekly ICU admissions is always very close to the data points, but the credible intervals narrow to reasonable bounds only towards the end of February (week 8 from the beginning of the year).
Prediction is challenging, as demonstrated by the precision of the predictions. For example, the 95% CrI of the predicted number of ICU admissions 3 weeks in advance, when the epidemic is still taking off (i.e. at the third week of January) is as wide as  Nonetheless, similarly to most epidemic models attempting predictions [13,34], results are not useful (i.e. precise enough to determine a health policy response) until after the epidemic has peaked.

Further results
We simulated the weekly count of Hospital admissions in the case of a pandemic and we extended our model enabling the inference of the parameters from these data. Despite the increased number of observations, the model performed very similarly to the case of nonpandemic ICU-counts data. We diagnosed identifiability problems in the uniform prior scenario and predictions were good only when more informative prior distributions (on the susceptibility and probability of hospitalization) were included. Results from this analysis are reported in Section 5 of the Additional file 1.
Other analyses performed include: prospective analysis for the uninformative scenario and retrospective analysis within the informative scenario. Results of these analyses are reported in Section 4 of the Additional file 1.

Discussion
In this paper we proposed a model to estimate and predict influenza outbreaks from routinely collected data on admissions to ICU/HDU.
We investigated the performance of the proposed model both on simulated and on real data. By fitting the model to simulated numbers of weekly ICU admissions, we discovered that, even with very vague prior information, we could obtain estimates of some of the main parameters, including the initial infection rate, the probability of going to ICU given infection, the effective reproduction number R n and the scaling factor for school holidays κ. When we injected information on the distribution of the average immunity (1 − π) and on p ICU , estimates of the remaining parameters could be obtained. We were also able to forecast the evolution of the outbreak by analysing the first months of the epidemic using data up to the peak of influenza activity.
The model was applied to real data on the weekly number of ICU admissions from seasons 2012/13, 2013/14 and 2014/15, confirming the performance obtained on the simulated data. The estimated values of the effective reproduction number R n were similar to those estimated during the past decade of seasonal influenza [8]. A scaling parameter allowed the transmission rate to vary between school and holiday/half-term periods, which resulted in a good fit of the model to the data for most of the seasons considered. A more complete investigation of the temporal variation of the transmission rate might improve the flexibility of our model, and therefore the fit to more anomalous epidemics.
Recently, a similar analysis was performed on the Finnish influenza pandemic of 2009 [16] using a more elaborate model, analysing confirmed data on both hospitalizations and GP consultation. Their inclusion of GP data enhances the performance of the inference. Nevertheless, these data are harder to collect in a larger population (England is almost 10 times more populated than Finland) and out of pandemic emergencies. By contrast, the inference performed through our model is driven by few data, though readily available, even in real time, in seasonal settings. A further advance of the model by [16] is that the transmission parameter is time varying according to a Gaussian Process: this allows an accurate description of the past dynamics but makes prediction infeasible, since this temporal variation cannot be forecast. By contrast, our simple piecewise constant model is able to well forecast the future trend and it includes enough flexibility to describe appropriately the present and the past data.
Our work has also some limitations: firstly, our model is non-age-specific. The assumption of homogeneous mixing across regions and age groups is very strong but this was dictated by the very small sample sizes which did not allow sub-grouping. Secondly, the quality of some estimates and predictions strongly relies on prior information on the proportion of non-immune people. As this information is needed to overcome the lack of identifiability in the parameters, we used sero-prevalence data following the 2010/11 epidemic. This is not likely to be correct for all the three seasons analysed, as the predominant strain circulating was different across seasons. Likewise, the model that describes the time elapsing between infection and ICU admission, is assumed to be fixed and mostly known, but this assumption is not likely to be valid. The other element that defines the observational process, i.e. the probability of ICU admission given infection, is also sensitive to the choice of prior distribution.

Conclusion
The work presented here is a proof of concept of the potential for estimation and prediction of influenza transmission from USISS data. At the same time, the results highlight the need of collecting external data to formulate an appropriate prior distribution on the initial immunity of the population, particularly in the event of a pandemic.
The availability of this information, together with the tool we have provided here, allows to retrospectively infer the epidemic parameters from routinely collected data on severe cases during seasonal outbreaks and to predict the temporal dynamics of new epidemics.