 Research
 Open Access
 Published:
Estimating the risk of SARSCoV2 deaths using a Markov switchingvolatility model combined with heavytailed distributions for South Africa
BMC Public Health volume 22, Article number: 1873 (2022)
Abstract
Background
SARSCoV2 (Covid19 virus) infection exposed the unpreparedness of African countries to healthrelated issues, South Africa included. Africa recorded more than 211 853 deaths as a consequence of Covid19. When rare and deadly diseases require urgent hospitalisation strikes, governments and healthcare providers are usually caught unprepared, resulting in huge loss of lives. Usually, at the beginning of such pandemics, there is no rich data for health practitioners and academics to be able to forecast the number of patients or deaths related to the pandemic. This study aims to predict the number of deaths associated with Covid19 infection. With the availability of the number of deaths on a daily basis, the results stemming from this study are important to inform and plan health policy.
Methods
This study uses the daily number of deaths due to Covid19 infection. Exploratory data analysis reveals that the data exhibits nonnormality, three structural breaks and volatility clustering characteristics. The Markov switching (MS)generalized autoregressive conditional heteroscedasticity (GARCH)type model combined with heavytailed distributions is fitted to the returns of the data. Using available daily reported Covid19related deaths up until 26 August 2021, we report 10day ahead forecasts of deaths. All forecasts are compared to the actual observed values in the forecasting period.
Results
The Anderson–Darling Goodness of fit test confirms that the fitted models are adequate for the data. The Kupiec likelihood ratio test and the root mean square error (RMSE) were used to select the robust model at different risk levels. At 95% the MS(3)GARCH(1,1) combined with Pearson’s type IV distribution (PIVD) is the best model. This indicates that the proposed bestfitting model is reasonable and can be used for predicting the daily number of deaths due to Covid19.
Conclusion
The MS(3)GARCH(1,1)PIVD model provides a reliable and accurate method for predicting the minimum number of death due to Covid19. The accuracy of the proposed model will assist policymakers, academics and health practitioners in forecasting the volatility of future healthrelated deaths in which the predictability of volatility plays an integral role in health risk management.
Introduction
SARSCoV2 a virus is commonly known as the Covid19 virus, first emerged in Wuhan, China, in December 2019 [1]. The virus rapidly spread to over 100 countries; South Africa included. Africa is a large population with a compromised immune system, has a high prevalence of diseases like Human Immunodeficiency Virus/Acquired Immunodeficiency syndrome, Malaria, Tuberculosis and many more [2]. The continent has a weak health care system and a poor economic discipline, and for these reasons, Africa is different from the other continents that are presently dealing with Covid19 [2].
In South Africa, the first case of covid19 was reported on the 5^{th} of March 2020, which was a result of international travel from Italy which led to the transmission of the virus locally [3]. Numerous ways of curbing the spread of Covid19 were communicated, including social distancing, regular sanitizing and washing of hands, quarantine, and lockdown [4]. Lockdown restricts people from leaving or entering buildings or other locations; this is done as a security measure in emergencies. The implementation of lockdown resulted in the closing of borders, the closure of schools and later the implementation of online learning in schools, and the closure of businesses. Further restrictions were imposed which included the banning of alcohol, closing of clubs and entertainment areas, the closing of churches, anything and everything that involved public gatherings. Even in funerals, a certain number of people was prohibited. Businesses reported a significant decline in their returns because of lockdown restrictions, especially those classified as "nonessential” [5]. Besides the threats posed by Covid19, South Africa had preexisting issues, including unstable economic growth, high unemployment rate, falling per capita income and unviable government debt trends [6].
South Africa was hit the hardest by Covid19 compared to other countries in Africa, reporting over 2,9 million confirmed cases and over 89 500 Covid19 related fatalities as of the 19^{th} November 2021. South Africa experienced significant peaks in the number of deaths during the first and second wave of the Covid19 pandemic, within the 2021 Midyear Population Estimates (MYPE) period between July 2020 and June 2021. This yielded a noticeable increase in the crude death rate (CDR) from 8,7 deaths per 1 000 people in 2020 to 11,6 deaths per 1 000 people in 2021. The rise in deaths in 2021 (approximated to be 34%), caused the 2021 Life Expectancy (LE) at birth to plunge for South Africa [5].
Shim et al. (2021) [7] estimated the risk of Covid19 deaths during the outbreak in Korea using timedelay adjusted crude case fatality risk (CFR). The data set used is from the Korea Centers for Disease Control and Prevention (KCDC). The data set is made up of geographic areas: Daegu, Gyeongsangukdo and other regions and Korea (national). The authors discovered that the extremely affected areas were Gyeongsangukdo and Daegu, other regions and the rest of Korea have a less severe profile. Furthermore, it was discovered that the fatality risk due to Covid19 increases with increasing age meaning that the older group exhibits a higher case fatality.
A similar study to the one done by Shim et al. [7] was done by Mizumoto and Chowell (2020) [8] in China, where they estimated the risk of death from Covid19 using the CFR. The data set used is from Hubei province and the city of Wuhan in China. The gamma, exponential and lognormal distributions were used. The results showed that the gamma distribution was the best fit for the delays from hospitalization to death and the lognormal produced the best fit for the delays from illness onset to death.
Nyabadza et al. (2020) [4] modelled the impacts of social distancing on Covid19 for South Africa using the susceptible exposedinfectedremoved (SEIR) model. The data was obtained from Coronavirus COVID19 (2019nCoV) data repository for South Africa, for March 2020. The model is fitted to the cumulative cases before lockdown and during the lockdown. The model showed that with the implementation of social distancing under the initial lockdown level between March 26 and April 13, 2020, there would be an increase in the number of infected cases. The model also looked at the effect of relaxing the social distancing measures after the first announcement of the lockdown. The results reveal that the relaxation of social distancing would increase the cases by a certain percentage and the opposite is true. The model results correctly forecasted the number of cases after the initial lockdown level was relaxed approaching the end of April 2020. These results have implications for the management and policy direction in the early phase of the epidemic.
Rivera et al. (2020) [9] investigated the excess mortality during Covid19 in the United States. This study looked at deaths directly and indirectly caused by Covid19 for 13 states with high Covid19 deaths, these states are Illinois, New Jersey, Massachusetts, Connecticut, New York, Washington, Colorado, Michigan, California, Florida, Indiana, Pennsylvania and Louisiana. The data used was collected from the National Center for Health Statistics (NCHS) Mortality Surveillance System (MSS) data release. A semiparametric model and a conventional model are used. It was found that the semiparametric model presents more advantages than the conventional approach. The authors concluded that all the states have an excess allcause that exceeds the number of deaths for Covid19.
Reddy et al. (2021) [10] and Surowiec and Warowny (2021) predicted the number of deaths due to Covid19. Reddy et al. (2021) [10] used a datadriven approach to predict the shortterm realtime total number of Covid19 cases and deaths using linear growth curves. Reddy et al. [10] found that linear growth curves provide reliable and accurate forecasts for a maximum period of 10 days ahead. For data that exhibit high volatility, researchers use volatility methods to estimate future returns. The daily number of deaths due to Covid19 exhibits high volatility clustering behaviour (just like volatility returns). This can be a justification for exploring the use of volatility models in the prediction of the number of deaths due to Covid19. Volatility models are commonly used to estimate the risk of financial returns [11]. Research on the robustness of volatility models combined with heavytailed distributions in estimating financial risk has extensively been done [12,13,14]. Surowiec and Warowny (2021) [14] employed the ValueatRisk (VaR) concept to estimate the death rate from Covid19 infection. Four Central European countries namely, Poland, Hungary, Czech Republic and Slovakia are used as “portfolios”. The data used is from 11^{th} of January 2021 to the 28^{th} of March 2021. The calculation methods report the VaR for the total number of deaths for the four countries over 14 days, the number of deaths is known for the initial 13 days and the 14^{th} day is forecasted. They employ the variance–covariance approach which is a parametric method that assumes that the portfolio components are normal, making the logreturns of the daily deaths normally distributed. According to Cont, (2001) [15] the logreturns have heavier tails compared to the normal distribution and they exhibit volatility clustering.
Therefore, this study departs from the study of Surowiec and Warowny (2021) [14] and Reddy et al. (2021) [10] by incorporating volatility clustering, structural breaks, and heavytailed distributions using the Markovswitching GARCHtype models combined with the heavytailed distribution in estimating the minimum daily number of deaths due to Covid19. To the best of our knowledge, there is restricted use of VaR models incorporating the MSGARCHtype model and heavytailed distributions on healthrelated data. In the literature, we could not find the application of the MSGARCHtype model combined with heavytailed distributions in estimating the risk of deaths due to Covid19 infection.
This study is interested in estimating the minimum daily number of death due to Covid19. To approximate the epidemiological and economic burden of infectious disease, it is critical to estimate the minimum number of daily deaths. The methods famously known to be used in Finance are used in this study, as the daily deaths have exhibited similar characteristics as financial returns, such as volatility clustering. Moreover, they exhibit heavier tails than the normal distribution [15].
Exploratory data analysis and data description
Data description
The data used in this study is the daily deaths due to COVID19 for South Africa from the COVID19 data repository for South Africa, which was constructed, sustained, and hosted by Data Science for social impact research group led by Dr. Vukosi Marivate at the University of Pretoria. The data is collected from the National Institute for Communicable Diseases (NICD) and the Department of Health (DoH). It contains 514 observations from 23 March 2020 to 26 August 2021. The data can be accessed on the website: https://github.com/dsfsi/covid19za/blob/master/data/
Data exploration
Figure 1 shows the time series plot for the daily deaths for South Africa. Figure 1 helps understand the trend of the deaths due to Covid19 in South Africa for 0 to 514 days. Day 0 corresponds to the first record of death on 23 March 2020, and day 514 is the last record of death on 26 August 2021.
In Fig. 1, the number of daily deaths indicates an increasing/upward trend, resulting in the data being nonstationary. There is also some highs and lows (bumps), which could be due to potential structural breaks in the data. Table 1 displays the tests for stationarity using the augmented DickeyFuller (ADF) and PhillipsPerron (PP) tests.
The pvalues of the ADF and PP test statistics are all less than 0.05 rejecting the null hypothesis of nonstationarity suggesting that the data is stationary at a 5% level of significance. Table 2 shows the descriptive statistics for the Covid19 data.
In Table 2, the skewness and kurtosis show that the data is nonnormal. Figure 2 shows the QQ plot for the data.
The QQ plot shows that the data points deviate from the line, indicating nonnormality. In the tails, the data points move away from the line, which indicates possible heavytails. Table 3 shows the tests for normality.
In Table 3, the null hypothesis that the data points follow a normal distribution is rejected by the Shapiro–Wilk, and Jarque Bera tests a 5% significance level. The data is, therefore, nonnormal.
To investigate the randomness of the data, Cox Stuart test result is provided in Table 4 and to further access the dependency of the daily deaths. Serial correlation is tested visually using an ACF plot in Fig. 3.
Table 4 shows Cox Stuart test, which is the tests of randomness. The null hypothesis of randomness is rejected at 5% significance level, which means that the data points are not identically and independently distributed.
Figure 3 shows the ACF plot for the data.
In Fig. 3, it can be seen that the data possesses serial correlation. This is supported by Table 5, which are tests of serial correlation.
The Durbin Watson test reject the null hypothesis of no serial correlation. The ARCH LMtest rejects the null hypothesis of no ARCH effects in the data, meaning that the data has ARCH effects present. Table 6 shows the results for the test of structural breaks in the data.
In Table 6, the null hypothesis of no structural breaks is rejected at 5% significance level, which indicates that the data does exhibit structural breaks. The breaks are reported at 3 points/dates, 04 June 2020, 9 February 2021 and 12 February 2021.
To estimate the risk of death due to Covid19, the concept of VaR is used. VaR estimation relies on certain distributional assumptions. Fitting a statistical distribution assumes that the data is stationary, independent, and identically distributed (iid.), with no serial correlation and no heteroscedasticity. The results previously reported show that the data is stationary, nonnormal, not independently and identically distributed, exhibits serial correlation and heteroscedasticity. Thus, the data is transformed. Possible transformations are explored using the BoxCox transformation technique, which revealed that the best transformation is the log transformation (\(\lambda <0.5\)). We define a logreturn as \({r}_{t}=\mathrm{log}\left(\frac{{d}_{t}}{{d}_{t1}}\right)\), where \({d}_{t}\) is the number of covid19 related death at day t and \({d}_{t1}\) is the number of covid19 related death at day \(t1\). Figure 4 shows the plot for the logreturns of the daily deaths for Covid19 for South Africa.
In Fig. 4, the transformed data seem to be stationary in the mean. However, there are still high and low fluctuations in the variance, which means volatility clustering is present in the data. Stationarity tests, namely the ADF and PP tests, are performed on the data to access the stationarity. Table 7 shows the pvalues of the ADF and PP test statistic.
Table 7 shows the pvalues of the stationarity tests, which are less than 0.05. The null hypothesis of nonstationarity is rejected at 5% significance level, and therefore the conclusion is that the log of the daily deaths is stationary. To explore more about the properties of the transformed data, the descriptive statistics were calculated. Table 8 shows the descriptive statistics for the transformed data.
The descriptive statistics in Table 8 reveal that the data is positively skewed, meaning that the tail on the righthand side is longer than the tail on the left, this indicates the nonsymmetric/nonnormality of the data. This can further be supported by the excess kurtosis more significant than 0, indicating that the data has thick and heavier tails than the normal distribution (is leptokurtic). Figure 5 shows the QQ plot, which is a visual representation of a normality test. Table 9 are the normality test.
The QQ plot in Fig. 5 indicates that the data points move away from the line, more significantly on the right tail, which supports the results of the data being heavytailed. Furthermore, the Shapiro–Wilk test and the Jarque–Bera test in Table 9 have pvalues less than 0.05, which is a clear indication that the null hypothesis of normality is rejected at 5% significance level. To investigate the randomness of the data, the Cox Stuart test is provided in Table 10, and to further assess if the data exhibits serial correlation and ARCH effects, Table 11 provides the ARCH LM test and Durbin Watson test.
Table 10 shows that the data is identically and independently distributed. In Table 11, the null hypothesis of no serial correlation is not rejected at 5% level of significance by the Durbin Watson test. The Arch LM test rejects the null hypothesis of no ARCH effects, meaning that ARCH effects are present in the data.
Summary of the EDA
From the exploratory data analysis, it can be concluded that the logreturns of the data exhibits,

Stationarity

Volatility clustering

NoSerial correlation

Arch effects (volatility clustering)

i.i.d.

three structural breaks

Heavy tails (nonnormal)
The summary of the exploratory data analysis leads to the following proposed model, a GARCH model to capture the volatility clustering and ARCH effects present in the data. Since the data revealed the presence of three structural breaks, a Markov SwitchingGARCHtype model is further proposed to capture the regime changes in the data. Heavytailed distributions, namely, Studentt distribution (StD), Skewed Studentt distribution (SStD), normal reciprocal inverse Gaussian distribution (NRIG), and Pearson type IV distribution (PIVD) to capture the nonnormality revealed by the data.
Methods
The GARCH model
The ARCH model by Eagle (1982) [16] is useful to forecast variance that changes over time. The GARCH model is a generalized version of the ARCH model, proposed by Bollerslev, (1986) [17]. The GARCH model allows for past conditional variances in the current conditional variance equation. The GARCH (p, q) model is written as,
From (1), the GARCH (1,1) is written as,
where \({h}_{t}\) is condional variance, \({a}_{t}={\sigma }_{t}{\epsilon }_{t},\) is the shock or random error, p and q are the number of lags in the model. and \({\epsilon }_{t}\sim N(\mathrm{0,1})\), the three positive numbers\(\omega\), \({\alpha }_{1}\) and \({\beta }_{1}\) are the parameters of the GARCH \((\mathrm{1,1})\) process and \({\alpha }_{1}+{\beta }_{1}<1\). The forecast horizon, \({\widehat{h}}_{t+n}\) converges to
The MSGARCH model
The GARCH model has been useful for capturing conditional variance, but in the case of structural shifts, the GARCH model may bias upward the GARCH estimates if the shifts are not accounted for. This may lead to the misinterpretation of the estimates [18]. Hamilton (1989) [19] proposed a nonlinear approach that accommodates changes in the regime. The approach uses a Markov switching regression to describe changes in the parameters of an autoregressive process.
Model specification
Ardia et al. (2018) [11] defined \({y}_{t}\in {\mathbb{R}}\) as the logreturns of an asset at time t. The mean of the logreturns is assumed to be zero and autocorrelated.
The Markovswitching can be specified as follows,
where \(\mathcal{D}\left(0,{h}_{k,t},{\xi }_{k}\right)\) is a continuous distribution with a mean zero, time changing variance \({h}_{k,t}\) and shape parameter \({\xi }_{k}\). \({s}_{t}\) is defined on the space \(\left\{1,\dots ,K\right\}\) which follows a homogeneous Markov chain with the transition probability matrix \(\mathbf{\rm P}\equiv {\left\{{p}_{i,j}\right\}}_{i,j=1}^{K}\), where.
\({p}_{i,j}={\mathbb{P}}[{s}_{t}\)=j\({s}_{t1}=i]\). \({h}_{k,t}\) is the variance of \({y}_{t}\). Conditional on regime \({s}_{t}=k\), \({h}_{k,t}\) is defined as a function of past returns and the additional regimedependant vector of parameters \({{\varvec{\theta}}}_{k}\).
The Markovswitching GARCH (1,1) is specified as,
where \({\omega }_{k}>0,{ \alpha }_{k}>0, {\beta }_{k}\ge 0\) and \({(\alpha }_{k}+{\beta }_{k})<1 (k=1,\dots ,K)\), \({{\varvec{\theta}}}_{k}\equiv {({\omega }_{k}, { \alpha }_{k}, {\beta }_{k})}^{T}\)
Model estimation
The Markovswitching GARCH model makes use of the maximum likelihood (ML) or the Bayesian Markov chain Monte Carlo (MCMC) estimation techniques. Both these techniques require evaluating the likelihood function [20].
The specifications and methods of the model can be found in the MSGARCH package in R.
Heavytailed distributions
Studentt distribution
The Gaussian GARCH model cannot explain the leptokurtosis present in asset returns [21]. it was suggested by Bollerslev (1987) [21] that the normality assumption of the error terms be replaced with that of the studentt distribution (StD). The form of the distribution can be written as,
The loglikelihood of the StD then becomes:
where \(\nu\) (number of degrees of freedom), \(\mu_{t}\) and \({\sigma }_{t}^{2}\) are the parameters.
Skewed studentt distribution
The StD was extended by Fernandez and Steel (1998) [22] which introduced the addition of a skewness parameter. The density function can be expressed as:
\(f\left(\epsilon \xi \right)\) is a unimodal density and \(\xi >0\) models the skewness.
Considering the random variable, \({z}_{t}=\frac{{\epsilon }_{t}m}{s}\) with mean equal to zero and variance equal to one, the standardized skewed Studentt loglikelihood is:
where \({I}_{t}=\left\{\begin{array}{c}1~ if~ {z}_{t}\ge \frac{m}{s}\\ 1 ~if~ {z}_{t}<\frac{m}{s}\end{array}\right.\)
Normal inverse Gaussian distribution
The normal inverse Gaussian (NIG) distribution is part of the generalised hyperbolic distribution (GHD). It was introduced by BarndorffNielsen (1994) [23]. To obtain the normal reciprocal inverse Gaussian (NRIG) distribution the parameter \(\lambda\) is set to \(\frac{1}{2}\).
The density function for a random variable \(Y\) which follows an NRIG distribution can be defined as follows:
Pearson type IV distribution
The system of curves referred to as Pearsonian was initially developed by Karl Pearson [24]. This family of curves contain distributions like the PIVD, normal distribution, StD and many more. The probability density function (pdf) of the PIVD is denoted by:
where \(n>1/2\), \(v\), \(a\) and \(\lambda\) are realvalued parameters, \(\infty <y<\infty\). The normalization constant \(c\) depends on \(n,v\) and \(a\). The parameters \(\lambda , v\) and \(a\) are location, skewness, and scale parameters respectively, where \(a>0\). The parameter \(n\) is kurtosis.
The loglikelihood of the PIVD can be expressed as follows:
\(N\) represents the number of observed data points in \({x}_{i}\). For further explanations, see [25].
Valueatrisk
Valueatrisk (VaR) can be defined as the maximum loss, that with a stated probability should not be exceeded during a given period. In market risk management, VaR measures the maximum plausible loss of a portfolio of a financial instrument over a given period [13].
For a stated probability, the VaR is defined as the pth quantile of \(F\) such that,
\({F}^{1}\) is the quantile function.
To access the precision of the results, backtesting is done on the VaR estimates. The Kupiec likelihood ratio test by Kupiec (1995) [26] is utilised. The Kupiec test is based on that a proper model should have a fraction of violations of VaR estimates near the respective tail probability \(\alpha\) [27].
The null hypothesis of the Kupiec likelihood ratio test is that the expected proportions of violations is equal to \(\alpha\). The test statistic can be written as:
\({y}^{\alpha }\) counts the amount of times where observed returns fall beneath (for long positions) or above (for short positions) the VaR estimate at level \(\alpha\).
The return process for the portfolio is defined as,
where \({h}_{t}\) is the volatility process and \({z}_{t}\) are the standardized residuals which are written as,
The VaR estimates forecast for the next day is defined as,
Empirical results
In this section, the data is described, and the data source is provided. Furthermore, the exploration of the data is done to reveal the intrinsic properties of the data. The suggested modelMSGARCH model is fitted with heavytailed distributions, and the model adequacy results are described.
A GARCH (1,1) model with normal distribution governing the innovations was fitted to the transformed data. Table 12 shows the results of the fitted GARCH (1,1) model with normal distribution governing the residuals.
In Table 12, all the parameter estimates for the GARCH (1,1) model are statistically significant at 5% level of significance. This means that a GARCH (1,1) model would be appropriate to fit the data. To check for asymmetric effects, the signbias test was used. Table 13 shows the results of the sign bias test.
The sign bias test has a pvalue greater than 0.05, therefore the null hypothesis of no asymmetric effects is not rejected at 5% level of significance. This means that an appropriate model should be symmetric. The next section of this paper explores a GARCHtype model with no asymmetric effects.
Markovswitching GARCH model
In this section, an MS(k)GARCH(1,1) model is fitted, the extracted residuals are tested for normality, and the appropriate heavytailed distributions are fitted to the standard residuals. The goodness of fit is also provided for the fitted models.
Due to the presence of three structural breaks in the data and the GARCH(1,1) best fitting the data, symmetric MS(k)GARCH(1,1) model with three structural breaks (\(k=3\)) is fitted to the data. Table 14 shows the results.
Table 14 shows the parameter estimates, standard error, and pvalues of the parameters of the MSGARCH model. The pvalues are less than 0.05, which means that the parameters are all statistically significant at a 5% significance level.
After extracting the residuals, they are tested for normality. Table 15 shows the basic statistics of the standardized residuals.
Table 15 shows that the residuals are positively skewed and that the residuals are extremely leptokurtic, which is seen by the big excess kurtosis. All this indicates that the residuals are heavytailed. Figure 6 shows the QQ plot for the residuals.
The QQ plot in Fig. 6 shows the data points for the residuals deviating from the normal line, especially on the righthand side. This supports the results of the kurtosis and skewness that the data exhibits heavier tails than the normal distributions. Table 16 shows the normality tests for the residuals.
The Shapiro–Wilk and Jarque Bera tests in Table 16 have pvalues less than 0.05, which means that the residuals do not follow a normal distribution. Since the residuals are nonnormal and heavytailed, we capture the nonnormality by fitting heavytailed distributions to the residuals. The Anderson–Darling (AD) statistic is used to check whether the fitted distributions fit the data well. Table 17 shows the parameter estimates for the MS(3)GARCH(1,1)with the heavytailed distributions governing the innovations and the pvalue for the Anderson Darling (AD) statistic.
From Table 17, the pvalues of the AD statistic are greater than 0.05 indicating that the fitted MS(3)GARCH(1,1)models with heavytailed distributions fit the data well. Thus, we estimate the VaR values at different VaR levels and backtest the VaR values using the Kupeic likelihood ratio test.
VaR estimation and backtesting
In Finance, VaR measures the risk of loss for investments. It estimates by how much a set of investments might lose (with a certain probability), given normal market conditions, in a given period such as a day. This paper looks at how many people will die each day at a given/specified probability level in South Africa. Table 18 shows the VaR estimates for the MS(3)GARCH(1,1) model fitted with heavytailed distributions.
From Table 18, we can observe that the MS(3)GARCH(1,1) model combined with NRIGD and PIVD produce VaR estimates that are closer to those of the empirical distribution at most VaR levels. This is not surprising, as we have seen that the MS(3)GARCH(1,1)NRIGD and MS(3)GARCH(1,1)PIVD provide a better fit to the data than MS(3)GARCH(1,1)StD and MS(3)GARCH(1,1)SStD. In order to check the adequacy of these VaR estimates, the Kupiec likelihood ratio test is used. The pvalues of the Kupiec likelihood ratio test at various significance levels are shown in Table 19.
Backtesting confirms which model is the best for estimating risk, this is the model with the highest pvalue. Table 19 shows that the MS(3)GARCH(1,1)NRIGD is the best model at 90%. At 95%, the MS(3)GARCH(1,1)SStD and MS(3)GARCH(1,1)PIVD are the best models, and MS(3)GARCH(1,1)StD, MS(3)GARCH(1,1)NRIGD, and MS(3)GARCH(1,1)PIVD are the best models at 97.5%. The MS(3)GARCH(1,1)NRIG is the best model at 99%. Thus, using the Kupiec likelihood ratio test for model selection, at 95% level, the best fitting models are the MS(3)GARCH(1,1)SStD and MS(3)GARCH(1,1)PIVD.
We use MS (3)GARCH (1,1) combined with heavy tailed distributions, (3) and (17) to forecast the outofsample minimum number of daily deaths i.e. minimum deaths due to Covid19 for 10 days head. In (17), to predict the minimum number of deaths at time t, we consider, \({\mu }_{T=1}\) the average number of deaths due to Covid19 for the period 27 March 2020 to 26 August 2021, which is the insample data, \({\mu }_{T=2}\) is the average number of deaths for the period 28 March 2020 to 27 August 2021, Thus, includes the forecasted number of deaths on 27 August 2021. \({\mu }_{T}\) for the periods \(T=3 \mathrm{up to} 10\) are estimated in the same way. In Table 20, \({\mathbf{V}}_{\mathbf{T}}\) denotes the actual/observed deaths per day, and \({\mathbf{V}}_{\mathbf{T}+1}\) denotes the forecasted death value. To assess the accuracy of the models in predicting the number of daily deaths due to Covid19 infection with a 95% probability, we present the actual observed and predicted values in the forecasting period in Table 20.
In Table 20, it can be said that with a probability of 95%, the number of deaths that will occur at 1 day ahead (27^{th} August 2021) will be greater than \(322.\) We illustrate how we forecast \({\mathbf{V}}_{\mathbf{T}+1}\) using (3) and (19): \({\widehat{h}}_{t+1}=\frac{0.0666}{1(0.0033+0.0001)}\) and \({VaR}_{t+1}=357\times \left(\left1.38561.8610\sqrt{{\widehat{h}}_{t+1}}\right\right)\), for MS(3)GARCH(1,1)PIVD hybrid model. We assess the predictive accuracy of the fitted models at 95% using the Root Mean Square Error, defined as
The best fitting model has a RMSE value of 74. The suggest that the proposed model is reasonable and can be used as an early warning tool to inform the government health officials and other stakeholders about the future daily number of deaths for South Africa.
In view of the existing challenges in the provision of healthcare in South Africa, reliable and accurate shortterm forecasts of Covid19 related deaths are critical to ensure optimal allocation of scarce resources. Having accurate and reliable information on the possible number of deaths in advance is a fundamental strategy to prepare for the number of beds required for hospitalisation, ventilators and health practitioners needed to mitigate against the predicted number of deaths due to Covid19 infection on forecasted days.Thereby, reducing the effect of untimely deaths due to Covid19 infection. The importance of predicting Covid19related deaths measures the effectiveness of all the mitigating efforts by Governments, such as vaccinations and lockdowns. It also measures the extent of the damage caused by Covid19 infection.
Conclusion
The SARSCoV2 pandemic is a global issue, threatening the human population and the economy.The estimation of deaths due to Covid19 may interest health analysts, policymakers, biostatisticians. This study estimates the risk of death due to the Covid19 virus using a MarkovSwitching GARCH type model combined with heavytailed distributions. The risk of death is estimated using the concept of VaR as it is used in financial time series. The results show that the MS(3)GARCH(1,1)–NRIGD is the best model at 90%. At 95% the MS(3)GARCH(1,1)–SStD and MS(3)GARCH(1,1)PIVD are the best models, and MS(3)GARCH(1,1)–StD, MS(3)GARCH(1,1)–NRIGD, MS(3)GARCH(1,1)–PIVD are the best at 97.5%. The MS(3)GARCH(1,1)–NRIG is the best model at 99%. Overall MS(3)GARCH(1,1)NRIGD is the best model, it outperforms all the models at almost all the levels. The VaR results showed that with a probability of 95%, the minimum number of daily deaths that will occur for the next 10 days is reasonable with a root mean square error of 74. This suggests that the model has a high predictive accuracy and can be used in making informed decisions by government health officials and policymakers.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in the University of Pretoria repository https://github.com/dsfsi/covid19za/blob/master/data/.
References
Pham H. On estimating the number of deaths related to Covid19. Mathematics. 2020;8(5):655.
Lone SA, Ahmad A. COVID19 pandemic–an African perspective. Emerg Microbes Infect. 2020;9(1):1300–8.
NICD. First case of COVID19 coronavirus reported in SA. 2020. Available at: https://www.nicd.ac.za/firstcaseofcovid19coronavirusreportedinsa/.
Nyabadza F, Chirove F, Chukwu CW, Visaya MV. Modelling the potential impact of social distancing on the COVID19 epidemic in South Africa. Comput Math Methods Med. 2020;2020:1–12.
Statistics South Africa. Business impact survey of the COVID19 pandemic in South Africa. 2020. Available at: http://www.statssa.gov.za/publications/Report008001/Report008001April2020.pdf.
de Villiers C, Cerbone D, Van Zijl W. The South African government's response to COVID19. J Public Budgeting Acct Financial Manage. 2020;32(5):797–811.
Shim E, Mizumoto K, Choi W, Chowell G. Estimating the risk of COVID19 death during the course of the outbreak in Korea, February–May 2020. J Clin Med. 2020;9(6):1641.
Mizumoto K, Chowell G. Estimating risk for death from coronavirus disease, China, january–february 2020. Emerg Infect Dis. 2020;26(6):1251.
Rivera R, Rosenbaum J, Quispe W. Estimating excess deaths in the United States early in the COVID19 pandemic. medRxiv. 2020.
Reddy T, Shkedy Z, Van Rensburg CJ, Mwambi H, Debba: Zuma, K. and Manda, S. Shortterm realtime prediction of total number of reported COVID19 cases and deaths in South Africa: a data driven approach. BMC Med Res Methodol. 2021;21(15):1–11.
Ardia D, Bluteau K, Boudt K, Catania L. Forecasting risk with Markovswitching GARCH models: A largescale performance study. Int J Forecast. 2018;34(4):733–47.
Chinhamu K, Chifurira R. Evaluating South Africa’s market risk using asymmetric power autoregressive conditional heteroscedastic model under heavytailed distributions. J Econ Financial Sci. 2019;12(1):1–11.
Liu R, Shao Z, Wei G, Wang W. Volatility estimation for Bitcoin: a comparison of GARCH models. J Acct Bus Fin Res. 2017;1:71–5.
Surowiec A, Warowny T. Covid19 Death Risk Estimation Using VaR Method. Eur Res Stud. 2021;24:368–79.
Cont R. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative finance. 2001;1(2):223.
Eagle RF. Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica. 1982;50(4):987–1007.
Bollerslev T. Generalized autoregressive conditional heteroskedasticity. J Econ. 1986;31(3):307–27.
Lamoureux CG, Lastrapes WD. Persistence in variance, structural change, and the GARCH model. J Bus Econ Stat. 1990;8(2):225–34.
Hamilton JD. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica J Econ Soc. 1989;57(2):357–89.
Ardia D, Bluteau K, Boudt K, Catania L, Trottier DA. Markovswitching GARCH models in R: The MSGARCH package. J Stat Software. 2019;91(4):1–38.
Bollerslev T. A conditionally heteroskedastic time series model for speculative prices and rates of return. Rev Econ Stat. 1987;69(3):542–7.
Fernández C, Steel MF. On Bayesian modeling of fat tails and skewness. J Am Stat Assoc. 1998;93(441):359–71.
BarndorffNielsen OE. Normal\inverse Gaussian processes and the modelling of stock returns. Department of Theoretical Statistics: Aarhus Universitet; 1995.
Bhattacharyya M, Chaudhary A, Yadav G. Conditional VaR estimation using Pearson’s type IV distribution. Eur J Oper Res. 2008;191(2):386–97.
Nagahara Y. The PDF and CF of Pearson type IV distributions and the ML estimation of the parameters. Statist Probab Lett. 1999;43(3):251–64.
Kupiec H. Techniques for verifying the accuracy of risk measurement models (Vol. 95, No. 24). Division of Research and Statistics, Division of Monetary Affairs, Federal Reserve Board. 1995.
Chifurira R, Chinhamu K. Using the generalized Pareto and Pearson typeiv distributions to measure valueatrisk for the daily South African mining index. Stud Econ Econ. 2017;41(1):33–54.
Acknowledgements
The authors thank Prof Henry Mwambi for providing the link to obtain the data used in this study. The authors are grateful to the anonymous referees for careful checking of the details and for helpful comments that improved this paper.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
NM performed the analysis and drafted the manuscript. RC and KC provided valuable edits to the manuscript at several stages. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Mthethwa, N., Chifurira, R. & Chinhamu, K. Estimating the risk of SARSCoV2 deaths using a Markov switchingvolatility model combined with heavytailed distributions for South Africa. BMC Public Health 22, 1873 (2022). https://doi.org/10.1186/s12889022142498
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12889022142498
Keywords
 SARSCoV2 (Covid19) infection
 Death
 Volatility models
 Valueatrisk
 Heavytailed distributions
 Structural breaks