Estimating the risk of SARS-CoV-2 deaths using a Markov switching-volatility model combined with heavy-tailed distributions for South Africa

Mthethwa, Nobuhle; Chifurira, Retius; Chinhamu, Knowledge

doi:10.1186/s12889-022-14249-8

Research
Open access
Published: 07 October 2022

Estimating the risk of SARS-CoV-2 deaths using a Markov switching-volatility model combined with heavy-tailed distributions for South Africa

Nobuhle Mthethwa¹,
Retius Chifurira¹ &
Knowledge Chinhamu¹

BMC Public Health volume 22, Article number: 1873 (2022) Cite this article

1612 Accesses
1 Altmetric
Metrics details

Abstract

Background

SARS-CoV-2 (Covid-19 virus) infection exposed the unpreparedness of African countries to health-related issues, South Africa included. Africa recorded more than 211 853 deaths as a consequence of Covid-19. When rare and deadly diseases require urgent hospitalisation strikes, governments and healthcare providers are usually caught unprepared, resulting in huge loss of lives. Usually, at the beginning of such pandemics, there is no rich data for health practitioners and academics to be able to forecast the number of patients or deaths related to the pandemic. This study aims to predict the number of deaths associated with Covid-19 infection. With the availability of the number of deaths on a daily basis, the results stemming from this study are important to inform and plan health policy.

Methods

This study uses the daily number of deaths due to Covid-19 infection. Exploratory data analysis reveals that the data exhibits non-normality, three structural breaks and volatility clustering characteristics. The Markov switching (MS)-generalized autoregressive conditional heteroscedasticity (GARCH)-type model combined with heavy-tailed distributions is fitted to the returns of the data. Using available daily reported Covid-19-related deaths up until 26 August 2021, we report 10-day ahead forecasts of deaths. All forecasts are compared to the actual observed values in the forecasting period.

Results

The Anderson–Darling Goodness of fit test confirms that the fitted models are adequate for the data. The Kupiec likelihood ratio test and the root mean square error (RMSE) were used to select the robust model at different risk levels. At 95% the MS(3)-GARCH(1,1) combined with Pearson’s type IV distribution (PIVD) is the best model. This indicates that the proposed best-fitting model is reasonable and can be used for predicting the daily number of deaths due to Covid-19.

Conclusion

The MS(3)-GARCH(1,1)-PIVD model provides a reliable and accurate method for predicting the minimum number of death due to Covid-19. The accuracy of the proposed model will assist policymakers, academics and health practitioners in forecasting the volatility of future health-related deaths in which the predictability of volatility plays an integral role in health risk management.

Peer Review reports

Introduction

SARS-CoV-2 a virus is commonly known as the Covid-19 virus, first emerged in Wuhan, China, in December 2019 [1]. The virus rapidly spread to over 100 countries; South Africa included. Africa is a large population with a compromised immune system, has a high prevalence of diseases like Human Immunodeficiency Virus/Acquired Immunodeficiency syndrome, Malaria, Tuberculosis and many more [2]. The continent has a weak health care system and a poor economic discipline, and for these reasons, Africa is different from the other continents that are presently dealing with Covid-19 [2].

In South Africa, the first case of covid-19 was reported on the 5^th of March 2020, which was a result of international travel from Italy which led to the transmission of the virus locally [3]. Numerous ways of curbing the spread of Covid-19 were communicated, including social distancing, regular sanitizing and washing of hands, quarantine, and lockdown [4]. Lockdown restricts people from leaving or entering buildings or other locations; this is done as a security measure in emergencies. The implementation of lockdown resulted in the closing of borders, the closure of schools and later the implementation of online learning in schools, and the closure of businesses. Further restrictions were imposed which included the banning of alcohol, closing of clubs and entertainment areas, the closing of churches, anything and everything that involved public gatherings. Even in funerals, a certain number of people was prohibited. Businesses reported a significant decline in their returns because of lockdown restrictions, especially those classified as "non-essential” [5]. Besides the threats posed by Covid-19, South Africa had pre-existing issues, including unstable economic growth, high unemployment rate, falling per capita income and unviable government debt trends [6].

South Africa was hit the hardest by Covid-19 compared to other countries in Africa, reporting over 2,9 million confirmed cases and over 89 500 Covid-19 related fatalities as of the 19^th November 2021. South Africa experienced significant peaks in the number of deaths during the first and second wave of the Covid-19 pandemic, within the 2021 Mid-year Population Estimates (MYPE) period between July 2020 and June 2021. This yielded a noticeable increase in the crude death rate (CDR) from 8,7 deaths per 1 000 people in 2020 to 11,6 deaths per 1 000 people in 2021. The rise in deaths in 2021 (approximated to be 34%), caused the 2021 Life Expectancy (LE) at birth to plunge for South Africa [5].

Shim et al. (2021) [7] estimated the risk of Covid-19 deaths during the outbreak in Korea using time-delay adjusted crude case fatality risk (CFR). The data set used is from the Korea Centers for Disease Control and Prevention (KCDC). The data set is made up of geographic areas: Daegu, Gyeongsanguk-do and other regions and Korea (national). The authors discovered that the extremely affected areas were Gyeongsanguk-do and Daegu, other regions and the rest of Korea have a less severe profile. Furthermore, it was discovered that the fatality risk due to Covid-19 increases with increasing age meaning that the older group exhibits a higher case fatality.

A similar study to the one done by Shim et al. [7] was done by Mizumoto and Chowell (2020) [8] in China, where they estimated the risk of death from Covid-19 using the CFR. The data set used is from Hubei province and the city of Wuhan in China. The gamma, exponential and log-normal distributions were used. The results showed that the gamma distribution was the best fit for the delays from hospitalization to death and the log-normal produced the best fit for the delays from illness onset to death.

Nyabadza et al. (2020) [4] modelled the impacts of social distancing on Covid-19 for South Africa using the susceptible exposed-infected-removed (SEIR) model. The data was obtained from Coronavirus COVID-19 (2019-nCoV) data repository for South Africa, for March 2020. The model is fitted to the cumulative cases before lockdown and during the lockdown. The model showed that with the implementation of social distancing under the initial lockdown level between March 26 and April 13, 2020, there would be an increase in the number of infected cases. The model also looked at the effect of relaxing the social distancing measures after the first announcement of the lockdown. The results reveal that the relaxation of social distancing would increase the cases by a certain percentage and the opposite is true. The model results correctly forecasted the number of cases after the initial lockdown level was relaxed approaching the end of April 2020. These results have implications for the management and policy direction in the early phase of the epidemic.

Rivera et al. (2020) [9] investigated the excess mortality during Covid-19 in the United States. This study looked at deaths directly and indirectly caused by Covid-19 for 13 states with high Covid-19 deaths, these states are Illinois, New Jersey, Massachusetts, Connecticut, New York, Washington, Colorado, Michigan, California, Florida, Indiana, Pennsylvania and Louisiana. The data used was collected from the National Center for Health Statistics (NCHS) Mortality Surveillance System (MSS) data release. A semiparametric model and a conventional model are used. It was found that the semiparametric model presents more advantages than the conventional approach. The authors concluded that all the states have an excess all-cause that exceeds the number of deaths for Covid-19.

Reddy et al. (2021) [10] and Surowiec and Warowny (2021) predicted the number of deaths due to Covid-19. Reddy et al. (2021) [10] used a data-driven approach to predict the short-term real-time total number of Covid-19 cases and deaths using linear growth curves. Reddy et al. [10] found that linear growth curves provide reliable and accurate forecasts for a maximum period of 10 days ahead. For data that exhibit high volatility, researchers use volatility methods to estimate future returns. The daily number of deaths due to Covid-19 exhibits high volatility clustering behaviour (just like volatility returns). This can be a justification for exploring the use of volatility models in the prediction of the number of deaths due to Covid-19. Volatility models are commonly used to estimate the risk of financial returns [11]. Research on the robustness of volatility models combined with heavy-tailed distributions in estimating financial risk has extensively been done [12,13,14]. Surowiec and Warowny (2021) [14] employed the Value-at-Risk (VaR) concept to estimate the death rate from Covid-19 infection. Four Central European countries namely, Poland, Hungary, Czech Republic and Slovakia are used as “portfolios”. The data used is from 11^th of January 2021 to the 28^th of March 2021. The calculation methods report the VaR for the total number of deaths for the four countries over 14 days, the number of deaths is known for the initial 13 days and the 14^th day is forecasted. They employ the variance–covariance approach which is a parametric method that assumes that the portfolio components are normal, making the log-returns of the daily deaths normally distributed. According to Cont, (2001) [15] the log-returns have heavier tails compared to the normal distribution and they exhibit volatility clustering.

Therefore, this study departs from the study of Surowiec and Warowny (2021) [14] and Reddy et al. (2021) [10] by incorporating volatility clustering, structural breaks, and heavy-tailed distributions using the Markov-switching GARCH-type models combined with the heavy-tailed distribution in estimating the minimum daily number of deaths due to Covid-19. To the best of our knowledge, there is restricted use of VaR models incorporating the MS-GARCH-type model and heavy-tailed distributions on health-related data. In the literature, we could not find the application of the MS-GARCH-type model combined with heavy-tailed distributions in estimating the risk of deaths due to Covid-19 infection.

This study is interested in estimating the minimum daily number of death due to Covid-19. To approximate the epidemiological and economic burden of infectious disease, it is critical to estimate the minimum number of daily deaths. The methods famously known to be used in Finance are used in this study, as the daily deaths have exhibited similar characteristics as financial returns, such as volatility clustering. Moreover, they exhibit heavier tails than the normal distribution [15].

Exploratory data analysis and data description

Data description

The data used in this study is the daily deaths due to COVID-19 for South Africa from the COVID-19 data repository for South Africa, which was constructed, sustained, and hosted by Data Science for social impact research group led by Dr. Vukosi Marivate at the University of Pretoria. The data is collected from the National Institute for Communicable Diseases (NICD) and the Department of Health (DoH). It contains 514 observations from 23 March 2020 to 26 August 2021. The data can be accessed on the website: https://github.com/dsfsi/covid19za/blob/master/data/

Data exploration

Figure 1 shows the time series plot for the daily deaths for South Africa. Figure 1 helps understand the trend of the deaths due to Covid-19 in South Africa for 0 to 514 days. Day 0 corresponds to the first record of death on 23 March 2020, and day 514 is the last record of death on 26 August 2021.

In Fig. 1, the number of daily deaths indicates an increasing/upward trend, resulting in the data being non-stationary. There is also some highs and lows (bumps), which could be due to potential structural breaks in the data. Table 1 displays the tests for stationarity using the augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests.

Table 1 P-values for tests for stationarity

Estimating the risk of SARS-CoV-2 deaths using a Markov switching-volatility model combined with heavy-tailed distributions for South Africa

Abstract

Background

Methods

Results

Conclusion

Introduction

Exploratory data analysis and data description

Data description

Data exploration

Summary of the EDA

Methods

The GARCH model

The MS-GARCH model

Model specification

Model estimation

Heavy-tailed distributions

Student-t distribution

Skewed student-t distribution

Normal inverse Gaussian distribution

Pearson type IV distribution

Value-at-risk

Empirical results

Markov-switching GARCH model

VaR estimation and backtesting

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Public Health

Contact us