Low dispersion in the infectiousness of COVID-19 cases implies difficulty in control

The individual infectiousness of coronavirus disease 2019 (COVID-19), quantified by the number of secondary cases of a typical index case, is conventionally modelled by a negative-binomial (NB) distribution. Based on patient data of 9120 confirmed cases in China, we calculated the variation of the individual infectiousness, i.e., the dispersion parameter k of the NB distribution, at 0.70 (95% confidence interval: 0.59, 0.98). This suggests that the dispersion in the individual infectiousness is probably low, thus COVID-19 infection is relatively easy to sustain in the population and more challenging to control. Instead of focusing on the much fewer super spreading events, we also need to focus on almost every case to effectively reduce transmission.


Introduction
Since the early outbreak of coronavirus disease 2019 (COVID-19) pandemic, huge efforts have been devoted on estimating key epidemiological parameters due to their important implication in mitigation planning. For instance, according to a survey posted in a public domain (https://github.com/midas-network/COVID-19/tree/master/parameter_estimates/2019_novel_coronavirus), there were at least 47 studies (either peer-reviewed or not) on the cumulative case count in a location have been posted, 39 works on the reproductive number R 0 (number of secondary cases may be cause by a typical primary cases), 13 on the incubation period (time delay between infection and symptom onset), 6 on the serial interval or generation interval (time delay between symptom onset or infection of an index case and its secondary case in a transmission chain), 6 on the symptomatic case fatality ratio. However, the individual variation in infectiousness, the dispersion rate (k), has been largely overlooked, except for one early work in Eurosurveillance [1]. He et al.  [2]. It is of note that there is mathematical modelling work based on imported and reported case numbers in a variety of countries showing that k could be 0.1 (95% CI: 0.05, 0.2) [3]. The recent study of Lau et al. [4] used a spatiotemporal transmission process model and estimated that overall dispersion parameter k is 0.45 for Cobb County, 0.43 for Dekalb, 0.39 for Fulton, 0.49 for Gwinnett, and 0.32 for Dougherty in Georgia, USA. In this work, with a larger dataset, we calculate k using the empirical offspring distribution approach. Our data are from mainland China where strict surveillance guaranteed the quality of the data. Since we adopted the basic definition approach, our methods do not rely on additional assumptions typically needed for mathematical modelling.

Method
Negative binomial distribution (NB) is used to model the distribution of secondary case numbers, i.e., the offspring numbers, of an index case. The dispersion parameter, k, (i.e., size, which is nonnegative) controls the variation of the NB distribution. A sufficiently small k implies that the majority of disease transmission was driven by a few superspreaders, and thus the spread is likely to be controlled by preventing super-spreading events. A large k implies that the NB distribution approaches a Poisson distribution, and the virus easily persist and is difficult to eradicate. Following the pioneer work of Lloyd-Smith et al. [5], we assume that the number of secondary cases, denoted by Z, for a typical primary case, follows NB (mean = R 0 , dispersion = k), and thus the variance is R 0 + R 0 2 /k. When k is sufficiently small, the distribution will have a peak at 0, and in the limit when k = 0, the NB distribution is concentrated at zero. When k = 1, the distribution is a geometric distribution; and when k approaches infinity, the NB distribution approaches a Poisson distribution with both mean and variance equal to R 0 [5].

Results and discussion
The k plays an important role in explaining the wide spreading of COVID worldwide, given a similar R 0 as the other coronavirus, i.e., the severe acute respiratory syndrome (SARS). Lloyd-Smith et al. [5] estimated a smaller k = 0.16 for the SARS outbreak in Singapore in 2003.
We first tried Riou et al.'s [1] method to calculate the R 0 and k in six countries (see Table 1), and found that R 0 is in line with World Health Organization (WHO) early estimates, while k cannot be reliably estimated. Then we obtained the numbers of secondary cases from a study by Xu et al. [9] (see Table 2), and estimated k = 0.7 (95%CI 0.59, 0.98) and R 0 = 0.69 (95%CI: 0.62, 0.77) using profile likelihood approach and the profile Log likelihood of the NB model given the data in R 0 versus k plane is shown in Fig 1. This estimate is larger than that of SARS around 0.16, but close to that of the 1918 pandemic influenza 0.94 (95%CI 0.59, 1.72) [2]. Our estimate is in line with Bi et al. 0.58 with 95% CI: 0.35, 1.18) [10]. However, we have 9120 confirmed cases, compared to Bi et al. 391 confirmed cases, and thus our estimate has a smaller confidence interval.
Our results suggest that the majority of the COVID-19 transmission is not due to superspreading events. The number of secondary cases of a primary case roughly follows a geometric distribution, large proportion of primary cases have potentials to generate more than one secondary cases. This indicates that COVID-19 is easy to persist in the general population if strong measure is not taken, given the similar R 0 as SARS. Therefore, outbreak mitigation is relatively difficulty without taking extreme efforts such as city lockdown. We adopted a similar method as in [1], and simulate a Negative-binomial process to match the observed daily cases in these country over the chosen time period when the number grew exponentially. Using a maximum likelihood approach to infer R 0 . The method is also explained in [6][7][8] Table 2 He et al. BMC Public Health (2020) 20:1558