Identifying US County-level characteristics associated with high COVID-19 burden

Background Identifying county-level characteristics associated with high coronavirus 2019 (COVID-19) burden can help allow for data-driven, equitable allocation of public health intervention resources and reduce burdens on health care systems. Methods Synthesizing data from various government and nonprofit institutions for all 3142 United States (US) counties, we studied county-level characteristics that were associated with cumulative and weekly case and death rates through 12/21/2020. We used generalized linear mixed models to model cumulative and weekly (40 repeated measures per county) cases and deaths. Cumulative and weekly models included state fixed effects and county-specific random effects. Weekly models additionally allowed covariate effects to vary by season and included US Census region-specific B-splines to adjust for temporal trends. Results Rural counties, counties with more minorities and white/non-white segregation, and counties with more people with no high school diploma and with medical comorbidities were associated with higher cumulative COVID-19 case and death rates. In the spring, urban counties and counties with more minorities and white/non-white segregation were associated with increased weekly case and death rates. In the fall, rural counties were associated with larger weekly case and death rates. In the spring, summer, and fall, counties with more residents with socioeconomic disadvantage and medical comorbidities were associated greater weekly case and death rates. Conclusions These county-level associations are based off complete data from the entire country, come from a single modeling framework that longitudinally analyzes the US COVID-19 pandemic at the county-level, and are applicable to guiding government resource allocation policies to different US counties. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11060-9.

dividing the number of deaths by the number of reported cases in each county. CFR regression was performed in a similar way to the cumulative death rate regression by fitting Poisson mixed models except for using an offset for log(total reported cases) instead of log(population size). IFR were calculated by dividing the number of deaths by the number of infected cases in each county.
Since county-specific number of total infected cases were not observed and would likely be underestimated by using the numbers of reported cases, we first estimated the county-specific total number of infected cases by dividing the number of total reported cases by a single constant. However, we found that though the intercept estimate had changed, the other regression coefficient estimates had not. We then allowed the reporting rates to vary by state. This time the intercept and state fixed effects had changed, but again the rest of the regression coefficient estimates were unchanged.
This motivated us to analytically explore how allowing the ascertainment rates to vary between states and counties would affect estimation of state fixed effects and the random effects parameters. To define the CFR model, assume is the number of reported cases and is the mean number of cumulative deaths in county j of state i.
Then the CFR Poisson mixed model can be written as where is the state fixed effect, is a vector of covariates, and the are countyspecific random effects to account for overdispersion. Further suppose the follow a normal distribution (0, 2 ). Assuming to be county-specific random effects following a normal distribution (0, ), it follows that the IFR regression model is identical to the CFR regression model except that the estimated state effects and the county-specific random effects can also be interpreted as capturing the state and county-level ascertainment rates. Therefore, the identical relative risk results for CFR and IFR regressions are presented in Additional Figure 5.
The CFR/IFR results differed from the cumulative death rate results possibly because of differential underestimation of asymptomatic and mildly asymptomatic cases by race/ethnicity, selection bias associated with both the subjects who were tested, e.g., symptomatic subjects were more likely to be tested and to test positive, large fluctuations in the numbers of tests from county to county, and insufficient testing capacity. Additional data collection, such as county-level testing data and race/ethnicity specific case and death counts, is needed to better estimate the number of infected cases. This will allow for more accurate of county-specific ascertainment rates and make IFR analysis results more reliable.

Varying Reporting Rates for Total Case and Death Rates
The discussion above also illustrates how varying reporting rates can be accounted for in the cumulative case and death rate models. We make this explicit below. As in the main text, assume is the number of reported cases and is the population of county j in state i. can be modeled by a Poisson mixed model with expected cases : where is the state effect, is a vector of covariates, is a vector of coefficients, and the are county-specific random effects follow a (0, 2 ). Assuming to be county-specific random effects following a normal distribution (0, ), it follows that the observed case rate regression for and true infected case rate regression for are the same except that the estimated state effects and the county-specific random effects can also be interpreted as capturing the state and county-level ascertainment rates.
In other words, the fixed states effects can account for any possible variation in reporting rates between states. Covariate coefficient estimates for will be unchanged, and there will be a reparameterization of the state fixed effects (from to ). The county random effects account for additional variation in reporting rates between counties within a state. However, there is an additional condition that the final reparametrized random effects are normally distributed. Covariate coefficient estimates for will be unchanged and there will be a reparameterization of the random effects variance (from 2 to ).
To emphasize, any state-to-state variability in reporting rates can be accounted for, but only a more restricted set of within state county-to-county variability in reporting