Timely epidemic monitoring in the presence of reporting delays: anticipating the COVID-19 surge in New York City, September 2020

Harris, Jeffrey E.

doi:10.1186/s12889-022-13286-7

Research
Open access
Published: 02 May 2022

Timely epidemic monitoring in the presence of reporting delays: anticipating the COVID-19 surge in New York City, September 2020

Jeffrey E. Harris^1,2

BMC Public Health volume 22, Article number: 871 (2022) Cite this article

1124 Accesses
8 Citations
3 Altmetric
Metrics details

Abstract

Background

During a fast-moving epidemic, timely monitoring of case counts and other key indicators of disease spread is critical to an effective public policy response.

Methods

We describe a nonparametric statistical method, originally applied to the reporting of AIDS cases in the 1980s, to estimate the distribution of reporting delays of confirmed COVID-19 cases in New York City during the late summer and early fall of 2020.

Results

During August 15–September 26, the estimated mean delay in reporting was 3.3 days, with 87% of cases reported by 5 days from diagnosis. Relying upon the estimated reporting-delay distribution, we projected COVID-19 incidence during the most recent 3 weeks as if each case had instead been reported on the same day that the underlying diagnostic test had been performed. Applying our delay-corrected estimates to case counts reported as of September 26, we projected a surge in new diagnoses that had already occurred but had yet to be reported. Our projections were consistent with counts of confirmed cases subsequently reported by November 7.

Conclusion

The projected estimate of recently diagnosed cases could have had an impact on timely policy decisions to tighten social distancing measures. While the recent advent of widespread rapid antigen testing has changed the diagnostic testing landscape considerably, delays in public reporting of SARS-CoV-2 case counts remain an important barrier to effective public health policy.

Peer Review reports

Background

Timely surveillance of the incidence of new cases is essential for effective control during an ongoing epidemic. When infections are detected primarily through voluntary testing of symptomatic individuals, as has been the case with the COVID-19 epidemic in the United States, there will be three main sources of incomplete or delayed reporting of new cases. First, there will be underreporting of new cases, especially among asymptomatic or mildly infected individuals who do not seek testing. Second, there will be a testing delay between the actual date when an individual becomes infected and the date when that individual is ultimately tested. Third, unless test samples are rapidly processed, there will be a further reporting delay between the date of testing and the date the test results are communicated by the reporting entity. The present research addresses the latter source of delay.

A statistical method for nonparametric or semiparametric estimation of the distribution of reporting delays was previous investigated in connection with delays in reporting of newly diagnosed AIDS cases during the 1980s [1]. The estimated distribution of delays allowed the analyst to predict the actual incidence of AIDS cases well before all cases were fully reported. That statistical method is adapted here to daily reports of newly diagnosed cases of COVID-19 by the New York City Department of Health and Mental Hygiene during the late summer and early fall of 2020, when there was heightened concern about the possible emergence of a new wave of infection in the city. Our objective is to determine whether this approach could have alerted public officials to the coming surge before it became apparent from other indicators.

Data

All data were downloaded from the New York City health department repository [2]. The data consisted of a series of daily updates of a file named case-hosp-death.csv. In this report, we relied solely on the first two variables in each updated file, labeled DATE_OF_INTEREST and CASE_COUNT, which we interpreted, respectively, as the date of diagnosis and the cumulative number of confirmed COVID-19 cases so far diagnosed by that date. We did not rely on data on hospitalizations or deaths in this study.

Figure 1 displays the reported numbers of test-confirmed, daily COVID-19 cases from June 21 through September 26, 2020. The horizontal axis measures the date of diagnosis, that is, the date on which the confirming diagnostic test was performed, rather than the date on which the case was subsequently reported by the health department.

During September 2020, there was increasing concern among public health officials that the low case counts reported during the summer were beginning to rise, and that the observed increase foreshadowed the onset of a new wave of SARS-CoV-2 infections in the city [3]. During August, as Fig. 1 shows, daily reported case counts had settled down to between 100 and 350, depending on the day of the week. By September 26, the number of cases diagnosed and thus far reported on September 21 had reached 456, as indicated by the arrow in the figure, nearly equaling the previous peak of 457 on July 7.

The precipitous decline in reported cases during September 22–26, however, posed a serious problem of data interpretation. It was widely acknowledged that recent case counts were significantly truncated as a result of reporting delays. Thus, the datapoint for September 26, showing only 11 cases, represented only those cases that were diagnosed and reported on that same date. In its website displaying trends in newly confirmed COVID-19 cases, the health department employed the usual workaround of advising readers that, as a result of delays in reporting, recent data were incomplete.

Methods

Statistical analysis of reporting delays

From successive daily updates of the case-hosp-death.csv file, we computed the quantities y_tu, corresponding to the number of confirmed infections diagnosed on date t but not reported until date t + u, that is, with a delay of u≥ 0 days. For example, the version of the file case-hosp-death.csv showing all reports through 7/21/2020 indicated that 9 cases had been diagnosed on that date and thus far reported by that date. The following day’s version of case-hosp-death.csv indicated that a total of 65 cases had been diagnosed on 7/21/2020 and reported by 7/22/2020. Thus, we have y_t0 = 9 and y_t1 = 65–9 = 56, where t corresponds in this example to the diagnosis date 7/21/2020. The very next day’s version indicated that a total of 132 cases had been diagnosed on 7/21/2020 and reported by 7/23/2020. Thus, we have y_t2 = 132–65 = 67. We used this method of successive differences to recover the underlying quantities {y_tu}, which formed the basic data for our analysis.

Our statistical approach followed earlier work [1]. Let the possible dates of diagnosis t range from 0 to T, where T> 0 is the last date on which we have received case reports, which we’ll call the cutoff date. Let the duration of reporting delay u range from 0 to n, where n> 0 is assumed to be the longest possible reporting delay. We further assume that T > n> 0. As a result of this assumption, our sample is bifurcated into two parts, which we call the early and late parts, respectively. The early part corresponds to dates of diagnosis t = 0, …, T − n. For these dates, we have by assumption a complete set {y_tu, u = 0, 1, …, n} of all reported cases diagnosed on each date. The late part corresponds to subsequent dates of diagnosis t = T − n + 1, …, T. For these dates, we have only a truncated set {y_tu, u = 0, 1, …, T − t} of reported cases diagnosed on each date, as some diagnoses have not yet been reported by the cutoff date T.

We considered the simplest model where the distribution of delays was independent of the date of diagnosis or any other observable, exogenous variable. That is, the probability that a case diagnosed at date t will be reported with delay u is α_u, where \({\sum}_{u=0}^n{\alpha}_u\) = 1. Let α = (α₀, α₁, …, α_n) denote the vector all parameters α_u. Extensions of this basic model, including a variation in which \({\sum}_{u=0}^n{\alpha}_u\) < 1, as well as more extended treatments that jointly estimate the incidence of disease and the distribution of reporting delays, are described elsewhere [1].

The basic idea is to estimate the delay distribution α from our observed data, and then use the estimate \(\hat{\alpha}\) to project the total number of cases diagnosed on a given date, including diagnoses yet to be reported. In general, we define \({z}_t=\sum_{u=0}^{\min \left(n,T-t\right)}{y}_{tu}\) as the total number of cases diagnosed on date t that have so far been reported by the cutoff date T. In the early part of the sample, for any date of diagnosis t = 0, …, T − n, this marginal sum simplifies to \({z}_t=\sum_{u=0}^n{y}_{tu}\) and represents the total number of cases diagnosed on that date. So, conditional on the marginal sums z_t, we have the projected number of cases ζ(α) = z_t, which is independent of α. Since we have already observed all the cases diagnosed on date t, there is nothing unknown to project.

In the late part of the sample, for any date of diagnosis t = T − n + 1, …, T, we can instead write the marginal sums as \({z}_t=\sum_{u=0}^{T-t}{y}_{tu}\) . The projected number of cases diagnosed at date t will depend on the parameters as ζ_t(α) = z_t/Ω_t(α), where \({\Omega}_t\left(\alpha \right)=\sum_{u=0}^{\min \left(n,T-t\right)}{\alpha}_u\) is the estimated probability that a case diagnosed at date t will be reported by the cutoff date T.

We assume that the counts {y_tu} are the realizations of independent Poisson random variables. Given the marginal sums z_t, the conditional likelihood of the parameters α is maximized by the following iterative procedure, which is equivalent to the EM algorithm [4]. Let \({w}_u=\sum_{t=0}^{T-u}{y}_{tu}\) denote the total number of cases reported with a delay of u days, summed over all dates of diagnosis t. We start with initial estimates \({\alpha}_u^{(0)}={w}_u/\sum_{\nu =0}^n{w}_{\nu }\) for all u = 0, …, n. At iteration k = 0, 1, 2, …, with provisional parameters α^(k), we update our parameters to \({\alpha}_u={w}_u/\sum_{t=0}^{T-u}{\zeta}_t\left({\alpha}^{(k)}\right)\), where the denominator is the projected total number of diagnosed cases for which a delay u has been observed. To complete the iteration, we normalize to get \({\alpha}_u^{\left(k+1\right)}={\alpha}_u/\sum_{\nu =0}^n{\alpha}_{\nu }.\) We continue to iterate until ⌈α^(k + 1) − α^(k)⌉ is arbitrarily small. Once we’ve converged on an estimate \(\hat{\alpha}\), the projected case counts are \({\zeta}_t\left(\hat{\alpha}\right)={z}_t/{\Omega}_t\left(\hat{\alpha}\right)\) for all t = 0, …T. We employed bootstrap methods to compute confidence intervals around these projections [5].

Results

Distribution of reporting delays

We estimated the probability distribution α of reporting delays up to a maximum of n = 21 days from data {y_tu} on case counts reported during August 15 – September 26, 2020. Thus, we took August 15 as the initial observation date t = 0, while September 26 was date t = T = 42. As a result, the early part of our sample, that is, the range of dates t for which the observed case counts {y_tu, u = 0, …, 21} were complete, ran from August 15 through September 5. The late part of our sample, in which the observations on y_tu were truncated, ran from September 16–26.

Figure 2 shows the estimated probability distribution \(\hat{\alpha}\) of reporting delays. Only 4.1% of confirmed COVID-19 cases were reported on the same day that the underlying diagnostic test was performed, that is, \({\hat{\alpha}}_0=\) 0.041. The estimated probability of reporting within 5 days of diagnosis was \(\sum_{u=0}^5{\hat{\alpha}}_u\) = 0.870. The mean reporting delay, conditional upon full reporting by n = 21 days, was \(\sum_{u=0}^Tu{\hat{\alpha}}_u\) = 3.31 days.

We observed shifts in the estimated distribution of reporting delays in New York City during the first 6 months of the COVID-19 epidemic. Reporting delays initially increased from the initial outbreak in early March through June 2020. From late June onward, however, the reporting delay distribution began to shift to the left, as shown by the comparison of the estimated cumulative distribution curves for the periods from June 21 – August 1 and from August 15 – September 26, shown in the Supplement Fig. A. During the more recent interval from August 15 – September 26, however, the reporting delay distribution appeared to be stable. For the 6-week interval ending October 20, as noted in the Supplement, the mean reporting delay was 3.21 days.

Incidence of COVID-19 cases corrected for reporting delays

The gray datapoints in Fig. 3 reproduce the observed counts of newly diagnosed COVID-19 cases during June 21 – September 26, 2020, as shown in Fig. 1 above. While only the counts z_t from August 15 (date t = 0) entered our estimation algorithm, we show the complete series back to June 21 to facilitate comparison.

The superimposed pink datapoints in the figure, by contrast, show projected counts \({\zeta}_t\left(\hat{\alpha}\right)\) for the most recent days t = T − n, …, T, that is, from September 5–26. Before t = T − n, our projections simplify to ζ_t = z_t, as reporting beyond n = 21 days is assumed to be complete. For September 21, as indicated by the arrow, the reported case count as of September 26 was z_t = 456. (See also Fig.1.) By contrast, the projected case count was \({\zeta}_t\left(\hat{\alpha}\right)={z}_t/{\Omega}_t\left(\hat{\alpha}\right)\) = 524, where we estimated \({\Omega}_t\left(\hat{\alpha}\right)\) = 0.870. The 95% confidence interval (95% CI) surrounding this projected case count was 507–542, significantly above the reported case count. Thereafter, the projected case counts showed a continuation of the surge, once more reaching 524 (95% CI, 492–562) on September 23. After remaining above 400 per day during September 24–25, the projected count displayed an expected weekend drop to 269 (95% CI, 194–440) on Saturday, September 26. Supplement Fig. B graphically depicts the estimated confidence intervals around the most recent projected case counts.

Neither the estimated distribution of reporting delays (Fig. 2) nor the projected number of diagnosed COVID-19 cases (Fig. 3) varied significantly when we extended the observation interval backward before August 15 or increased the maximum duration of reporting delays beyond 21 days. (Results not shown.)

Comparison of projected and ultimately reported cases

Figure 4 compares our projected case counts (ζ_t), again colored in pink, with case counts ultimately reported by the health department as of November 7, 2020, shown in light blue. Here, we have shifted the timeline to cover the interval from August 23 – October 11. Comparison of the two series shows significant concordance between the projected case counts and the numbers of cases ultimately reported almost 4 weeks after the end of the interval. A chi-squared test of goodness of fit failed to reject the null hypothesis that the distributions of the projected and reported case counts were equal (χ²= 17.26, 20 degrees of freedom, yielding p = 0.364).

Supplement Fig. C plots the projected against the ultimately reported daily case counts during September 5–26. There was no significant serial correlation of the residuals.

Discussion

The usual workaround to address data truncation due to delayed reporting is to attach an advisory to a website graphic warning the viewer that the most recent trend is to be ignored. The key message of this article is that, so long as the distribution of reporting delays is stable, the most recently reported case counts need not be thrown out. Instead, we can use recent past data on reporting delays in order to project the population-level counts of new cases as if they had all been reported on the date of diagnostic testing.

As our study of the surge in COVID-19 cases in the fall of 2020 in New York City suggests, the resulting timely estimate of recently diagnosed cases could have had an impact on policy decisions to tighten social distancing measures. On September 29, New York City Mayor de Blasio signaled his intention to close nonessential businesses and all public and private schools in key neighborhoods of the boroughs of Queens and Brooklyn [6]. But it was not until October 6 that New York Governor Cuomo actually intervened [7]. As it turned out, coronavirus cases had actually been increasing throughout the city [3].

Diagnosis dates versus report receipt dates

Many state and local health departments – including the New York City health department studied here – have tabulated counts of COVID-19 cases according to the date the relevant diagnostic test was performed. This reporting convention not only creates the data truncation problem illustrated by the precipitous drop in case counts at the far right in Fig. 1, but it also requires the public health authority to continually update past counts every time a new case report is received. These difficulties can be avoided by the alternative of reporting cases according to the date the test result was received. But that alternative can give a biased picture of recent incidence trends. For example, if a testing site delivers results to a public health authority in periodic batches, the reported case counts can show artifactual surges [8].

Absolute case counts versus the test positivity rate

One alternative to relying on absolute case counts is to compute positive tests as a fraction of all tests performed, an indicator often called the test positivity rate. This approach does not really confront the problem of reporting delays. Instead, it simply converts the problem into one of delayed reporting of test positivity rates. In fact, if negative test results are reported with a different delay distribution than positive tests, the resulting bias due to reporting delays may be exaggerated. In any event, the test positivity rate depends on the number of negative test results, which is itself an endogenous variable. Relying on a misleading decline in test positivity due to a surge in negative testing by worried well individuals, state-level officials later relaxed social distancing measures in early November when absolute case counts were still rising in the most vulnerable neighborhoods in Brooklyn [3].

Slow molecular versus rapid antigen diagnostic tests for SARS-CoV-2 infection

Our focus here has been on diagnostic tests for active infection, rather than antibody tests to detect whether an individual mounted an immune response to a past infection. At the time of our study of New York City in the fall of 2020, molecular tests that amplify the virus’ genetic material – particularly tests based on the polymerase chain reaction (PCR) – were far and away the dominant diagnostic technology. More recently, rapid antigen tests of specific proteins encasing the virus’ genetic material have become widely available as an alternative [9]. While clinics and doctors’ offices are required to report the results of these rapid antigen tests to local health agencies, individuals performing these tests at home generally do not. As a result, reported case counts are still dominated by the slower PCR technology. With molecular tests remaining the gold standard for testing [10], the problem of delays in publicly reported case counts of COVID-19 has not been obviated.

Aggregate population delays versus individual-level delays

We have described and tested a statistical method for overcoming reporting delays at the aggregate population level. Our approach does not accelerate the reporting of test results at the individual level. We focus on the time delay from the performance of a diagnostic test to its appearance in a health department’s aggregate public tally. Our results say nothing about the time required to privately communicate test results to individual patients. Delays in informing individuals about positive and negative test results can critically influence decisions to go into or come out of isolation, and to stay away from or return to work. Technologies that facilitate communication of at-home, rapid antigen tests to public authorities may reduce delays both at the aggregate population level and the individual patient level.

Testing delays and underreporting

Even with the proposed statistical correction for reporting delay, there remains the problem of testing delay. In the system of voluntary, symptom-motivated testing in the United States, testing delay has two components. The first is the incubation period between initial infection and first symptoms of illness, initially estimated to be about 5 days for the ancestral strain of SARS-CoV-2, about 4 days for the Delta variant, and closer to 3 days for the more recent Omicron variant [11]. The second is the additional delay between the onset of symptoms and date the test is performed. Widely available rapid antigen testing may have partially reduced the second bottleneck.

Quite apart from the issue of testing delay, there is now growing evidence that during the Omicron wave, as many as 75% of all SARS-CoV-2 infections have not been reported by public authorities [12]. While some of this underreporting has been the result of the growing use of in-home rapid antigen tests, there has been a surge in asymptomatic and mildly symptomatic infections. Quantitative modeling of underreporting remains a challenging problem.

Joint modeling of disease incidence and reporting delays

The statistical method employed here focused sharply on delays in the reporting of diagnosed COVID-19 cases. We did not model underlying trends in the incidence of COVID-19. The original study that we relied upon here explored a combined model of both the incidence of AIDS and the delayed reporting of the disease [1]. That combined model contained one set of parameters (call it β) governing the incidence of AIDS and another set of parameters (which we’ve called α) governing the distribution of reporting delays, but no parameters common to both processes. So long as the combined model adhered to this so-called separability property, there was no inherent bias in exclusively estimating the parameters α of the distribution of reporting delays. More recent attempts to jointly estimate the incidence and reporting delay distribution of Salmonella infections, dengue fever and acute respiratory illnesses have all shared this underlying separability feature [13,14,15,16,17].

The separability property would be violated, for example, if a high rate of disease incidence caused bottlenecks in case reporting. In abstract terms, the joint model of disease incidence and reporting delays would have a common parameter γ that influenced both processes. Here, we found that COVID-19 case reporting delays declined significantly in New York City during the summer of 2020 (Supplement Fig. A), a period of declining reported disease incidence. However, the observed decline in reporting delays persisted even as new cases started to rise in early October 2020. The shift in the reporting-delay distribution appears to have resulted from independent technical improvements in testing capacity that continued even during the subsequent winter surge [18, 19].

Conclusion

The application of delay-corrected estimates of recent incident cases can play an important role in anticipating the emergence of a new wave during an ongoing epidemic. In the case of New York City in September 2020, which we studied here, the projected estimate of recently diagnosed cases could well have had an impact on timely policy decisions to tighten social distancing measures. While the recent advent of widespread rapid antigen testing has changed the diagnostic testing landscape considerably, delays in public reporting of SARS-CoV-2 case counts remain an important barrier to effective public health policy.

Availability of data and materials

Supporting programs and data have been posted at https://osf.io/u4svy/. An earlier version of this article was posted on the medRxiv.org preprint server at https://doi.org/10.1101/2020.08.02.20159418.

Abbreviations

CI:: Confidence Interval
PCR:: Polymerase Chain Reaction

References

Harris JE. Reporting delays and the incidence of AIDS. J Am Stat Assoc. 1990;85(412):915–24.
Article Google Scholar
New York Department of Health and Mental Hygiene. Case-hosp-death.Csv (comma-separated-value data file). https://github.com/nychealth/coronavirus-data/blob/master/archive/case-hosp-death.csv: Last Commit, 13, 2020 2020.
Harris JE. Failure of concentric regulatory zones to halt the spread of COVID-19 in south Brooklyn, New York: October–November 2020. https://www.medrxiv.org/content/10.1101/2021.11.18.21266493v1:medRxiv.org, November 21 2021.
Demster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B. 1977;39:1–22.
Google Scholar
StataCorp. Stata statistical software: release 17. College Station: StataCorp LLC; 2021.
Google Scholar
Lardieri A. De Blasio Warns of Reimposing Coronavirus Restrictions as New York Positivity Rate Rises. https://www.usnews.com/news/health-news/articles/2020-09-29/de-blasio-warns-of-reimposing-coronavirus-restrictions-as-new-york-positivity-rate-rises: US News, Sept 29 2020.
New York Governor. Governor Cuomo Announces New Cluster Action Initiative https://www.governor.ny.gov/news/governor-cuomo-announces-new-cluster-action-initiative: Press Release, Oct 6 2020.
Harris JE. Data from the COVID-19 epidemic in Florida suggest that younger cohorts have been transmitting their infections to less socially mobile older adults. Rev Econ Household. 2020;18(ePub August 22):1019–37. https://doi.org/10.1007/s11150-020-09496-w.
Article Google Scholar
U.S. Food & Drug administration. Coronavirus disease 2019 testing basics. https://www.fda.gov/consumers/consumer-updates/coronavirus-disease-2019-testing-basics: Last accessed 7 Feb 2022.
U.S. Centers for Disease Control and Prevenetion. Interim Guidance for Antigen Testing for SARS-CoV-2. https://www.cdc.gov/coronavirus/2019-ncov/lab/resources/antigen-tests-guidelines.html: Updated Jan 20 2022.
Jansen L, Tegomoh B, Lange K, et al. Investigation of a SARS-CoV-2 B.1.1.529 (Omicron) Variant Cluster - Nebraska, November–December 2021. MMWR Morb Mortal Wkly Rep. 2021;70(5152):1782–4. https://doi.org/10.15585/mmwr.mm705152e3 [published Online First: 20211231].
Article CAS PubMed PubMed Central Google Scholar
Harris JE. Estimated Fraction of Incidental COVID Hospitalizations in a Cohort of 250 High-Volume Hospitals Located in 164 Counties. https://www.medrxiv.org/content/10.1101/2022.01.22.22269700v1: MedRxiv, Jan 24 2022.
Salmon M, Schumacher D, Stark K, et al. Bayesian outbreak detection in the presence of reporting delays. Biom J. 2015;57(6):1051–67. https://doi.org/10.1002/bimj.201400159 [published Online First: 20150806].
Article PubMed Google Scholar
Bastos LS, Economou T, Gomes MFC, et al. A modelling approach for correcting reporting delays in disease surveillance data. Stat Med. 2019;38(22):4363–77. https://doi.org/10.1002/sim.8303 [published Online First: 20190710].
Article PubMed PubMed Central Google Scholar
McGough SF, Johansson MA, Lipsitch M, et al. Nowcasting by Bayesian smoothing: a flexible, generalizable model for real-time epidemic tracking. PLoS Comput Biol. 2020;16(4):e1007735. https://doi.org/10.1371/journal.pcbi.1007735 [published Online First: 20200406].
Article CAS PubMed PubMed Central Google Scholar
Rotejanaprasert C, Ekapirat N, Areechokchai D, et al. Bayesian spatiotemporal modeling with sliding windows to correct reporting delays for real-time dengue surveillance in Thailand. Int J Health Geogr. 2020;19(1):4. https://doi.org/10.1186/s12942-020-00199-0 [published Online First: 20200303].
Article PubMed PubMed Central Google Scholar
Stoner O, Economou T. Multivariate hierarchical frameworks for modeling delayed reporting in count data. Biometrics. 2020;76(3):789–98. https://doi.org/10.1111/biom.13188 [published Online First: 20191129].
Article PubMed Google Scholar
CBS New York. Mayor De Blasio Says Delay In Coronavirus Test Results Should Be Resolved: 'Getting Much Better'. https://www.cbsnews.com/newyork/news/mayor-de-blasio-says-delay-in-coronavirus-test-results-should-be-resolved-getting-much-better/: Jul 23 2020.
New York Governor. Governor Cuomo Announces New Record-high Number of COVID-19 Tests Reported to New York State https://www.governor.ny.gov/news/governor-cuomo-announces-new-record-high-number-covid-19-tests-reported-new-york-state-1: New York State, Governor’s Press Office, Sept 20 2020.

Download references

Acknowledgments

This article represents the sole opinion of its author and does not necessarily represent the opinions of the Massachusetts Institute of Technology, Eisner Health, or any other organization.

Funding

None.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Jeffrey E. Harris
Eisner Health, Los Angeles, CA, 90015, USA
Jeffrey E. Harris

Authors

Jeffrey E. Harris
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The sole author (JEH) is responsible for the conceptualization of the work, data analysis, programming, drafting the manuscript, and drawing the figures. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Jeffrey E. Harris.

Ethics declarations

Ethics approval and consent to participate

This study relies exclusively on publicly available data that contain no individual identifiers. No ethics approval or individual consent to participate was required.

Consent for publication

The sole author (JEH) consents to the publication of this manuscript.

Competing interests

The sole author (JEH) has no competing interests to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig. A

. Estimated Cumulative Distribution of COVID-19 Reporting Delays, New York City, June 21 – August 1 (Green) and August 15 – September 26, 2020 (Purple). During June 21 – August 1, an estimated 65.2% of cases were reported within 5 days of the date when the diagnostic test was performed. During August 15 – September 26, this proportion had increased to 87.0%. The mean reporting delay was 4.96 days during June 21 – August 1 and 3.31 days during August 15 – September 26. Thereafter, the cumulative distribution remained relatively stable. During September 8 – October 20, an estimated 88.6% of cases were reported within 5 days, while the mean reporting delay was 3.21 days. (Results not shown.). Fig. B. Reported and Projected COVID-19 Diagnoses, New York City, September 6–26, 2020, Including 95% Confidence Intervals. As in Fig. 3 in the main text, reported cases as of September 26 (z_t) are indicated by gray-colored datapoints. Projected cases (ζ_t), based up the estimated distribution (\(\hat{\alpha}\)) of reporting delays, are indicated by the pink datapoints. Computed 95% confidence intervals, based upon the bootstrap method, are also shown for the projected diagnoses from September 20–26. Before September 20, the computed confidence intervals were smaller than the diameters of the datapoints. Fig. C. Projected Daily Case Count as of September 26 Versus Reported Daily Case Count as of November 7. The superimposed 45-degree line indicates equality between the two variables. The arrow shows the data for September 21, where the projected count was 524 and the ultimately reported count was 513.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Harris, J.E. Timely epidemic monitoring in the presence of reporting delays: anticipating the COVID-19 surge in New York City, September 2020. BMC Public Health 22, 871 (2022). https://doi.org/10.1186/s12889-022-13286-7

Download citation

Received: 24 February 2022
Accepted: 21 April 2022
Published: 02 May 2022
DOI: https://doi.org/10.1186/s12889-022-13286-7

Timely epidemic monitoring in the presence of reporting delays: anticipating the COVID-19 surge in New York City, September 2020

Abstract

Background

Methods

Results

Conclusion

Background

Data

Methods

Statistical analysis of reporting delays

Results

Distribution of reporting delays

Incidence of COVID-19 cases corrected for reporting delays

Comparison of projected and ultimately reported cases

Discussion

Diagnosis dates versus report receipt dates

Absolute case counts versus the test positivity rate

Slow molecular versus rapid antigen diagnostic tests for SARS-CoV-2 infection

Aggregate population delays versus individual-level delays

Testing delays and underreporting

Joint modeling of disease incidence and reporting delays

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Fig. A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Public Health

Contact us