Ethics
Ethics approval was not required for this study because data are publicly available from the JHU coronavirus data repository [1] (https://github.com/CSSEGISandData/COVID-19). These data are daily reported confirmed case, recovery, and death numbers that do not contain any confidential or identifiable patient data. Theses publicly available data have been widely utilised by governments, news outlets and other epidemiological studies.
Data summary
Daily counts of reported confirmed COVID-19 cases, recoveries and deaths for each country are obtained from the JHU coronavirus resource center [1, 18] (publicly available at https://github.com/CSSEGISandData/COVID-19). We refer the reader to the “Discussion” section for comments on this data source. Population data for 2020 were obtained from United Nations Population Division estimates [29].
We analyse three different time periods: i) 22 January–30 March; and ii) 22 January–13 April; and iii) 22 January–9 June. These periods are selected as they broadly represent the time period of the initial outbreak of COVID-19; covering the initial exponential growth period, the first epidemic peak, and subsequent initial recovery period of many countries. We use these time points to look at the changes in key model parameters relating to a countries responses over time.
Countries are included in the analysis for a give time period provided the cumulative number of confirmed COVID-19 cases exceeded 100 at least one day prior to the end of the particular analysis period. While the specific threshold is empirically chosen as 102, the lower bound on the initial case numbers is needed to ensure that the infectious population is large enough for sufficient mixing to occur so that a compartmental model is a reasonable approximation (i.e., as threshold of 101 initial cases will lead to poor approximations and 103 cases will miss the early stages of the pandemic in many countries). The initial case number threshold should be exceeded at least one day before the end of the analysis period to ensure there are at least two observations in the time series for all countries. Using these inclusion criteria, we obtain N=98 countries for the period of 22 January–30 March, N=121 countries for period of 22 January to 13 April, and N=158 countries for the period 22 January–9 June. Countries included in one period are not removed from subsequent periods and lower values of N in earlier periods reflect the fact that fewer countries had experienced outbreaks by that time.
Analysis summary
For each country, i=1,2,…,N, the JHU maintains a time-series, \({\mathcal {D}}_{i} = \left [\left \{C_{t,i},R_{t,i},D_{t,i}\right \}_{T \geq t \geq 0}\right ]\), where Ct,i,Rt,i, and Dt,i are, respectively, the cumulative confirmed cases, case recoveries and case deaths on day t for country i, t=0 is the first day such that Ct,i≥100 and t=T is the end of the study period. Since there are variations in reporting protocols across countries and time as well as data curation challenges [2], caution is necessary in the interpretation of our analysis across all countries over time.
Bayesian parameter inference is applied over three time periods. The first period, 22 January to 30 March, is used to assess the community response to the initial outbreak of COVID-19. The second period, including data up to 13 April, encompasses the time period in which the efficacy of the community response starts to become evident. Finally, the third period includes data up to the 9 June, in which many countries had started to relax restrictions. We also consider in this analysis the prevention of future waves, and highlight the sensitivity of system dynamics to the uncertainty in unobserved infectious individuals.
Mathematical model
A stochastic epidemiological compartmental model is used to describe the spread of COVID-19 within a single country over the time period t∈(0,T]. The assumed well-mixed population of size P is comprised of six compartments: susceptible, St; infectious, It; confirmed active cases, At; case recoveries, Rt; case fatalities, Dt; and unconfirmed recoveries, \(R_{t}^{u}\). The population that is susceptible to the SARS-CoV-2 infection, St, can be infected by individuals from the unobserved infectious population, It, including both symptomatic and asymptomatic infections. The active confirmed cases, At, are those who have tested positive for COVID-19 but have not yet recovered or died. We assume individuals in At are isolated from the susceptible community (e.g., self-isolated, quarantined, or hospitalised) and no longer contribute to new infections. Importantly, At need not be symptomatic, but may have been identified from contact tracing protocols or community wide testing. Rt and Dt are, respectively, the population of confirmed cases that recover or die. These correspond to the recoveries and deaths reported in the JHU data. Lastly, \(R_{t}^{u}\) is the population of infected individuals that recover or die without being tested for COVID-19; these individuals no longer spread the infection but do not contribute to the reported recovery and fatality counts. The cumulative confirmed cases, as reported by the JHU, can be obtained by Ct=At+Rt+Dt.
The populations St,It, and \(R_{t}^{u}\) are not observable and are latent variables in our model. Therefore, strategies for managing the spread of the virus, such as NPIs, are informed by the observables, At,Rt, and Dt. Media coverage, official government information, and health authority reports based on these observables may also affect the behaviour of individuals. For example, frequent reports on growing case numbers may increase voluntary self-isolation; conversely, media coverage that downplays the risk of infection or seriousness of the disease may lead to widespread non-compliance with health advice or government regulations. We model this dynamic introducing a feedback loop within the transmission process.
A schematic of this system that highlights the state transitions and the feedback loop is given in Fig. 1. The dynamics can be described by the differential equations,
$$ \begin{aligned} {\dot{S}_{t}} &= -g\left(A_{t},R_{t},D_{t}\right)S_{t} I_{t}/P, \quad {\dot{I}_{t}} = -(\gamma + \eta\beta)I_{t} + g(A_{t},R_{t},D_{t})S_{t} I_{t}/P, \\ {\dot{R}^{u}_{t}} &= \eta \beta I_{t}, \quad {\dot{A}_{t}} = \gamma I_{t} - (\beta + \delta)A_{t}, \quad {\dot{R}_{t}} = \beta A_{t}, \quad\text{and}\quad {\dot{D}_{t}} = \delta A_{t}. \\ \end{aligned} $$
(1)
Here, g(·)>0 is the transmission rate function, γ>0 is the identification rate, β>0 is the case recovery rate, δ>0 is the case fatality rate, and η>0 is the latent removal rate relative to the case recovery rate. Initial conditions for the observables, A0,R0,D0, are obtained from the JHU data. To capture uncertainty in community spread at early time we set I0=κA0, where κ>0 is the relative number of unobserved cases. We assume initially \(R_{0}^{u} = 0\) and S0=P−C0−I0. Although Eq. (1) shows a deterministic system for ease of interpretation, we apply a stochastic equivalent for the analysis in this work (See Supplementary Material).
The novel feedback mechanism provides a general framework to describe how communities change their behaviour as case numbers rise. This is similar to the influence of media reports that have been the subject of study for influenza and HIV [30, 31]. However, our approach includes a response strength parameter.
We define a so-called reporting function,
$$ U(A_{t},R_{t},D_{t}) = w_{A} A_{t} + w_{R} R_{t} + w_{D} D_{t}, $$
(2)
where the weights wA,wR,wD≥0, represent the relative weighting of observables in contributing to information that influences individual behaviour, introduction of NPIs, and subsequent compliance with government regulation or health advice. In the context of this work, the weights wA,wR, and wD have a very important interpretation, but we first need to present more details of the feedback mechanism.
We consider a nonlinear transmission rate of the form,
$$ g(A_{t},R_{t},D_{t}) = \alpha_{0} + \alpha f(U(A_{t},R_{t},D_{t})), $$
(3)
where the response function, f(·)∈[0,1], is a decreasing function with respect to U(·),α is the controllable transmission rate such that αf(·) is a transmission rate that decreases as the reporting function increases and α0 is the residual transmission rate as f(·)→0. The strength of the response s=(1−f(·))×100% is the percentage reduction in community transmission, excluding residual transmission α0.
For the response function we assume the form
$$ f(U(\cdot)) = \frac{1}{1+U(\cdot)^{n}}, $$
(4)
where the parameter n≥0 controls the rate of decrease with respect to the reporting function. This form is selected for two reasons. Firstly, it generalises techniques that capture the influence of media reports during epidemics [30]. Secondly, the weights from Eq. (2) have an important interpretation. This can been seen by noting that values for At,Rt and Dt that satisfy the condition U(At,Rt,Dt)=1, indicate the threshold case numbers that leads to a response strength of 50%, that is, f(U(·))=1/2 leading to g(·)=α0+α/2. The effect of the slope parameter, n, and the weights are shown in Fig. 2. If U(·)=0, that is no cases are reported, or wA=wR=wD=0, indicating no perceived risk, then the model reduces to an SIR model in the unobserved population with transmission rate α0+α.
For n≤1 the shape of f(U(·)) starts to decline rapidly levelling out (Fig. 2a). Increasing n>1 results in a decreasing sigmoid curve with an inflection point at U(·)=1 in which a population response strength reaches 50%. Small n describes a population that does not significantly reduce the transmission rate until the U(·) is large. Conversely, larger n describes a population that acts decisively as a response that rapidly reduces transmission around U(·)=1. Large values of weights wA,wR,wD correspond to lower acceptable thresholds of cases, including active, recovery and death counts. Lower weights lead to delayed responses. That is, the parameter n relates to the rate of intervention introduction and the weights relate to decision thresholds and subsequent compliance. Importantly, our approach does not distinguish between different NPIs and voluntary population behaviour, but rather models the net effect that reporting has on transmission rates.
We focus on the reporting function with wR=wD=0, that is, U(At,Rt,Dt)=wAAt (see “Discussion” section and Supplementary Material for alternatives). Since At will increase and decline over to course of an outbreak, the model can exhibit oscillations that are essential for understanding the potential for flare-ups and multiple waves.
Bayesian analysis
Parameter inference
Our model has up to 11 parameters: two transmission rate parameters, α0 and α; case recovery rate β; case identification rate γ; case death rate δ; relative latent recovery rate η; response slope parameter n; the initial infected scale factor κ; and the weights of the reporting function wA,wR, and wD. We assume wR=wD=0 and infer the parameters θ=[α0,α,β,γ,δ,η,n,κ,wA] (see SupplementaryMaterial for sensitivity analysis for the general case).
Using the daily case data, \({\mathcal {D}}_{i}\), for each country i∈[1,2,…,N], we infer model parameters within a Bayesian framework by sampling the joint posterior distribution,
$$ {p({{\theta}}\mid{{\mathcal{D}}_{i}})} \propto {p({{\mathcal{D}}_{i}}\mid{{\theta}})} {p({\theta})}, $$
(5)
where \({p({{\mathcal {D}}_{i}}\mid {\theta })}\) is the likelihood and p(θ) is the prior distribution.
We rely on adaptive sequential Monte Carlo for approximate Bayesian computation (SMC-ABC) [32–35] to obtain approximate posterior samples since the likelihood is intractable (Supplementary Material). We use independent uniform priors, \(\alpha _{0},\alpha,\beta,\gamma,\delta,\eta \sim \mathcal {U}(0,1), n \sim \mathcal {U}(0,20), \kappa \sim \mathcal {U}(0,100)\), and \(\log _{10} w_{A} \sim \mathcal {U}({-6},{-2})\).
Assessment of model fit and prediction
The highly variable nature of the COVID-19 pandemic makes it notoriously difficult to predict [20, 21]. Our purpose is not to provide forecasts, but rather to capture the dynamic effects of changes in community behaviour during the outbreak. As a result, our model needs to be able to capture the overall trends in daily cases.
Model fit is assessed through sampling the posterior predictive distribution
$$ {p({{\mathcal{D}_{s}}}\mid{{\mathcal{D}}_{i}})} = \int {p({{\mathcal{D}_{s}}}\mid{{\theta}})}{p({{\theta}}\mid{{\mathcal{D}}_{i}})}\, \mathrm{d}{\theta}, $$
(6)
where \({\mathcal {D}_{s}}\) is simulated data as generated by the model. We compute the 50% and 95% credible interval (CrI) of \({p({{\mathcal {D}_{s}}}\mid {{\mathcal {D}}})}\) by: 1) generating simulated data for each posterior sample generated by the SMC-ABC sampler; and 2) computing quantiles of the simulation state distribution at each observation time.
The posterior predictive distribution is also used to asses flare-up risk. After model calibration using data up to 9 June, we continue simulations to 24 June and obtain CrIs for oscillatory behaviour relating to localised flare-ups and additional waves.
Parameter point estimation and uncertainty quantification
Parameter point estimates are also obtained from the approximate posterior sample with the lowest average discrepancy with the observed data (See SupplementaryMaterial). Parameter uncertainty is reported using 95% CrI for the marginal approximate Bayesian posterior distributions (See Supplementary Material). This uncertainty quantification encompasses all plausible parameter combinations within the achieved acceptance threshold of the ABC-SMC method.
Correlation analysis
We use the point estimates to evaluate which factors have had the greatest impact on the COVID-19 outbreak evolution across countries. For each country, we manually classify the state of the outbreak for each country on 13 April based on the trend in the daily reported cases and active case numbers. These stages are: the growth stage–characterised as an increasing trend in daily reported cases numbers; the post-peak stage–characterised by declines in daily case numbers, indicating the curve is flattening; the recovery stage–characterised by declines in active case numbers. Spearman’s rank-order correlation coefficients are computed between each parameters and observed data at T= 13 April (i.e., cumulative case numbers, CT, recoveries, R, and deaths, DT).