A large-scale immuno-epidemiological simulation of influenza A epidemics.

Background Agent based models (ABM) are useful to explore population-level scenarios of disease spread and containment, but typically characterize infected individuals using simplified models of infection and symptoms dynamics. Adding more realistic models of individual infections and symptoms may help to create more realistic population level epidemic dynamics. Methods Using an equation-based, host-level mathematical model of influenza A virus infection, we develop a function that expresses the dependence of infectivity and symptoms of an infected individual on initial viral load, age, and viral strain phenotype. We incorporate this response function in a population-scale agent-based model of influenza A epidemic to create a hybrid multiscale modeling framework that reflects both population dynamics and individualized host response to infection. Results At the host level, we estimate parameter ranges using experimental data of H1N1 viral titers and symptoms measured in humans. By linearization of symptoms responses of the host-level model we obtain a map of the parameters of the model that characterizes clinical phenotypes of influenza infection and immune response variability over the population. At the population-level model, we analyze the effect of individualizing viral response in agent-based model by simulating epidemics across Allegheny County, Pennsylvania under both age-specific and age-independent severity assumptions. Conclusions We present a framework for multi-scale simulations of influenza epidemics that enables the study of population-level effects of individual differences in infections and symptoms, with minimal additional computational cost compared to the existing population-level simulations. Electronic supplementary material The online version of this article (doi:10.1186/1471-2458-14-1019) contains supplementary material, which is available to authorized users.

It was reported that across all the studies, of the 67 individuals measured for symptoms, 65 experienced at least one symptom. In all studies that measured symptoms, symptom scores were measured twice a day (hour 0 and hour 12 in each day) and reported at the first time point in the day (hour 0). For this reason, we use hour 6 as the time point to describe symptoms.

Data abstraction
The viral titers across the five studies ( [1,2,3,4,5]) are all reported in log 10 T CID 50 /mL, and as averages across individuals in the respective studies. Plots of viral load curves show that qualitatively they look similar except for variations in the viral magnitude between studies. Mapping viral load to a measure of infectivity for disease transmission requires normalization of the viral load, and the important components of the data to capture are when the viral load increases, peaks, and decreases. To adjust the data, we use mixed-effects modeling to compute a vertical shift for each data curve (random-effect), assuming that the distance between each curve is minimized. For each study, we get the shifts in table 2. We adjusted each curve by the shift and computed weighted means and standard deviations (fixed-effect) on the shifted data. The mean of the data is the same with and without the shifts, but the standard deviation is smaller with the shifts and reflects variability between each study from one mean trajectory.  Ensemble model trajectories (see Figure 3 in main text) for viral load with each curve readjusted by shift to compare with the original data from the 5 studies: [1, 2, 3, 4, 5]

Parameter Estimation
We write the ODE model for short asẋ = f (x, α), in which x denotes the vector of state variables, and α = (α 1 , α 2 , ..., α 6 ) is the vector of parameters of the system. The observed data is denoted by D = (x i (t j ), σx i (t j ) ), for response i, at time point j, with standard deviation σ. Here, 2 is the number of variables we fit data to, and we fit at time points 1 to 8 days (or 1.25 to 8.25 days).
Using the standard assumption that the errors on the data are uncorrelated, random, and sampled from a Gaussian kernel, the likelihood P (D|α) is computed from the trajectory x(t; α) of the model as an exponential of the objective value, i.e., as P (D|α) = exp(−C(α)), given by [6] .
Optimal parameter values for the target-cell model for A/Texas/36/91 are reported in [7], which we use as baseline except we increase the initial inoculum V 0 from 1e-5 to 0.01. We set upper and lower bounds using biological ranges for parameters c, δ, k, and V 0 [8], and 3.5 log scales around the baseline values for the additional parameters, with the exception of the viral production rate p. Preliminary fits showed values of p exploring outside of the bounds, so we increased the upper bound accordingly.
To sample the Bayesian posterior density P (α|D), we choose to run 4 parallel chains of length 1,000,000. We choose inverse temperature values of β = (1, 0.5, 0.25, 0.125), and step size = 0.075/ √ β. The step size for each chain is chosen so that each chain will have acceptance ratio (i.e., the ratio of accepted α * to the total number of proposed α * ) approximately equal to 0.23, which has been show to provide optimal convergence speed [9]. Acceptance rates are (0.2475, 0.2479, 0.2546, 0.2797). Average swapping rates, from highest energy down are (0.3558, 0.3529, 0.4117).

Response surfaces
The computed domain for response surface evaluation is [−0.5, 0.5] × [−0.5, 0.5]. We linearly mapped input values in [0, 1] to these intervals to use uniform random numbers as input. These arrays may be evaluated up to 11 days of infection, via: For viral load we have: and for symptoms:

Calibration of the intra-host model in FRED
FRED is an open source, C ++ modeling system developed by the University of Pittsburgh Public Health Dynamics Laboratory in collaboration with the Pittsburgh Supercomputing Center and the School of Computer Science at Carnegie Mellon University. In Allegheny County, there are a total of 1,164,879 individuals represented by synthetic computer agents assigned characteristics and behaviors (age, sex, occupation, household etc.) [10]. We use data from the synthetic population database, which is a freely available database based on 2005-2009 U.S. census data [11]. The data includes, for each household, a latitude/longitude coordinate, income, size, and the sex, race, and age of household occupants, representing the distribution of U.S. households. Allegheny County has 524584 households, with 532 schools, 48703 workplaces, 195 group quarters, and 30583 group quarter residents.
Contact rates between individuals have assigned probabilities, which are derived from MIDAS studies from the 1957-58 Asian influenza pandemic: Ferguson et al. [12], Longini et al. [13], Germann el al. [14] and Halloran et al. 2008 [15], and are summarized in [16]. Each type of place (Home, School, Work, Neighborhood) is characterized by two sets of scalar parameters: the number of contacts per infectious person per day, and the probability that a contact transmits infection.
The number of contact opportunities per place per day are calibrated to obey a "30-70" rule, which is that 30% of viral transmission occurs in the home, and 70% occurs elsewhere. These numbers are calibrated as an inverse problem, in which contact opportunities is an input parameter, and measured so that output epidemics obey the desired transmission proportions. Our target for a pandemic is a 33% symptomatic attack rate (ARs) with a global attack rate (AR) of 50%. The 33% clinical attack rate for the baseline emerges naturally from our symptoms threshold, and the calibration process is further refined to maintain this ratio.
For the "baseline" within-host model, the transmission rates per person per location per day are 0.21 in the house, 39.9 in the neighborhood, 1.5 at work, and 13.97 in schools. Over 50 runs, this gives an average of 30.01% transmission in the house, 32.96% in the neighborhood, 24.7% in school and 12.31% at work. For the intra-host model found as default in FRED, the calibration process is the same, but for Allegheny County these give different rates per place per day, which are household contacts = 0.20, neighborhood contacts = 42.48, school contacts = 14.32, and workplace contacts = 1.59.

Within-host model in Reference ABM intra-host model
We describe the within-host assumptions in a reference intra-host model used in ABM studies, which we call the "reference" model [17,18]. The reference FRED model is the withinhost model used when FRED is downloaded or used on the website (http://fred.publichealth.pitt.edu/index.php) in the default mode. In the reference withinhost model in FRED, the number of days of infectiousness and symptoms are selected from a probability distribution. Each infected agent is either symptomatic or not, based on a Bernoulli random variable. The infectivity of an agent is assigned one of three values: 0 if the agent is uninfected, 0.5 if the agent is infected and asymptomatic, or 1.0 if the agent is infected and symptomatic. An infected agent passes through stages SEIR or SEiR, in which "I" means infectious and symptomatic, and "i" means infectious and asymptomatic. The length of these phases is determined from sampling cumulative distribution functions which are based on continuous distribution functions that approximate an exponential distribution.

(3)
The values for symptomaticity use infectivity and are assigned values 0 or 1. Days incubating are calculated as either the same number as days latent if the individual is symptomatic, or the duration of infectiousness if the individual is asymptomatic. When symptomaticity is 1.0, it dictates an individual's behavior. For the reference ABM model, the mean duration of infectivity (4.1 days) and mean latent period (1.2 days) can be evaluated directly using the discrete probability distributions explicitly used in the model, and give an average total duration of infection as 5.3 days from inoculation to resolution. We estimate the basic reproductive rate (R 0 ) as the average number of secondary infections by the initial 100 individuals that are exposed at the beginning of the simulation.

Age-severity models
We use 3 models, both mapping f : age → [0, 1], where x 1 = f (age) is a measure of disease of severity via evaluation of the response surface for direction 1. Here we describe the 3 models.

Linear model
We assume that individuals with age a 1 or younger develop similarly mild infections (x 1 = 0) and those aged a 2 or older develop similarly severe infections (x 1 = 1): We use a 1 = 18 and a 2 = 65. The value of x 1 = 0.5 corresponds to age of 41.5, which is consistent with the median age of 41 years in Allegheny County. For age ∈ [0, 100] the area under the curve is 58.5.

'U' and 'W' curve models
We use mortality data published in Luk et al. [19] to obtain the shapes of the curves. We assigned each data point to the center point age and connected data points using piece-wise linear interpolation. We set the first and last points (corresponding to < 1 and > 84 years) to a value of 1, corresponding to most severe disease, and scaled the interior data points so that the area under the curves would be 58.5.