 Technical advance
 Open Access
 Open Peer Review
 Published:
New ways of estimating excess mortality of chronic diseases from aggregated data: insights from the illnessdeath model
BMC Public Healthvolume 19, Article number: 844 (2019)
Abstract
Background
Recently, we have shown that the agespecific prevalence of a disease can be related to the transition rates in the illnessdeath model via a partial differential equation (PDE). The transition rates are the incidence rate, the remission rate and mortality rates from the ‘Healthy’ and ‘Ill’ states. In case of a chronic disease, we now demonstrate that the PDE can be used to estimate the excess mortality from agespecific prevalence and incidence data. For the prevalence and incidence, aggregated data are sufficient  no individual subject data are needed, which allows application of the methods in contexts of strong data protection or where data from individual subjects is not accessible.
Methods
After developing novel estimators for the excess mortality derived from the PDE, we apply them to simulated data and compare the findings with the input values of the simulation aiming to evaluate the new approach. In a practical application to claims data from 35 million men insured by the German public health insurance funds, we estimate the populationwide excess mortality of men with diagnosed type 2 diabetes.
Results
In the simulation study, we find that the estimation of the excess mortality is feasible from prevalence and incidence data if the prevalence is given at two points in time. The accuracy of the method decreases as the temporal difference between these two points in time increases. In our setting, the relative error was 5% and below if the temporal difference was three years or less. Application of the new method to the claims data yields plausible findings for the excess mortality of type 2 diabetes in German men.
Conclusions
The described approach is useful to estimate the excess mortality of a chronic condition from aggregated agespecific incidence and prevalence data.
Trial registration
The article does not report the results of any health care intervention.
Background
Recently, we have shown that the agespecific prevalence of a health state or disease can be related to the transition rates in the illnessdeath model via a partial differential equation (PDE) [1, 2]. The transition rates are the incidence rate, the remission rate and mortality rates from the Healthy and Ill states (Fig. 1). In case of a chronic disease, i.e. a disease with no remission, this relation can be used to estimate the incidence from a sequence of crosssectional studies if information about mortality is available [3]. This might be an alternative way to estimate the incidence of a chronic condition in situations where followup studies are challenging to conduct or not feasible at all.
In this article, we demonstrate that it is also possible to estimate excess mortality from agespecific prevalence and incidence of a chronic disease. This can be useful for the analysis of data where it is difficult to observe mortality directly, for instance in disease registers [4] or health insurance claims data where cases of death might be reported with a delay [5]. Another example where excess mortality of a chronic condition cannot be estimated directly is the US National Health Interview Survey (NHIS) from the National Center for Health Statistics [6]. NHIS is a yearly crosssectional household interview survey with up to 90,000 participants each year. Usually, participants are followed up for mortality by linkage to the National Death Index. This implies that it is possible to check the vital status of a participant from a previous crosssectional interview, but it is not possible to decide if a deceased participant who had been diseasefree at the interview, has contracted the disease in the period between the crosssection and the date of death. With other words, for a subject diseasefree at the interview, it is not possible to determine the disease status at death. Thus, in estimating the mortality it is uncertain to attribute this case to the mortality of the healthy or of the diseased subjects.
To overcome these problems, we examine mathematical relations of the illnessdeath model and associated PDEs to develop reliable estimators for excess mortality.
Methods
Illnessdeath model
We consider the illnessdeath model as shown in Fig. 1. Each subject of the population is in one of the relevant disease states, Healthy (with respect to the considered chronic disease) Ill or Dead. Let the number of people aged a at calendar time t in the Healthy and Ill states be denoted by H(t, a) and I(t, a), respectively. Subjects can transit from both states into the (absorbing) state Dead. The transition rates between the three states are the incidence rate (i), the remission rate (r), the mortality rate of the healthy (m_{0}) and the mortality rate of the diseased (m_{1}). These rates usually depend on calendar time t and on age a. Henceforth, we consider only chronic, i.e., irreversible diseases, which is equivalent to a remission rate of zero (r = 0).
To develop estimators for the excess mortality Δm = m_{1} – m_{0}, we use mathematical relations between the incidence, prevalence p(t, a) = I(t, a)/{I(t, a) + H(t, a)} and the mortality rates in the illnessdeath model.
An alternative epidemiological measure to Δm for assessing discrepancies between the mortality rates m_{0} and m_{1}, is the mortality rate ratio R = m_{1}/m_{0} which is of potential interest for practitioners. The mortality rate ratio R expresses the mortality rate of the diseased people relative to the nondiseased at the same age. Due to this plain interpretation, R is more often used than the (absolute) excess mortality Δm. Both measures, Δm and R, are related by R = 1 + Δm/m_{0}.
Direct estimation in simulated data about dementia
To illustrate how measures of excess mortality in a chronic disease can directly be estimated from incidence and prevalence data, we conduct a simulation study. We mimic a sequence of two crosssectional studies for a chronic disease in two different years t_{1} and t_{2} centered at the year t = 2000. Let ΔT = t_{2} – t_{1}denote the difference between t_{1} and t_{2}, i.e. t_{1} = 2000 – ΔT/2 and t_{2} = 2000 + ΔT/2. In each of the crosssectional studies at t_{1} and t_{2}, the agespecific prevalence p is surveyed (Fig. 2).
The aim is to estimate the excess mortality at year t = 2000 from the crosssectional prevalence data at t_{1} and t_{2} and the incidence. To assess the impact of the temporal difference between the crosssectional studies, we vary ΔT from 0.1 to 10 (years). Together with the agespecific incidence rate i at t = 2000, the prevalence data in the two years t_{1} and t_{2} serve as input values to estimate the excessmortality in the year t = 2000. The estimated excess mortality is then compared with the rates used to set up the simulation study in terms of absolute and relative bias.
The input data for the simulation are motivated from survey data about dementia in the female population of Europe [7]. Dementia is a major health problem in many countries with potentially increasing prevalence in the future [8]. The agespecific prevalence p for each of the two years t_{1} and t_{2} is calculated analytically with the incidence rate i from [7]. The agespecific mortality rate m_{0} of the dementiafree population is chosen to be m_{0}(t, a) = exp.(− 10.7 + 0.1a + t ln(0.99)) aiming to approximate the mortality of the European population based on the GompertzMakeham law of mortality [9]. In addition, we assume that the mortality m_{1} of the diseased people can be written as a product of m_{0} and R: m_{1}(t, a) = R(t, a) × m_{0}(t, a) with log R(t, a) = log(3) + [log(1.5) – log(3)] (a – 60)/(90–60). The rationale for choosing this R is based on the idea that m_{1} also follows a GompertzMakeham law. Then, the logarithm of the quotient m_{1}/m_{0} is a straight line as given here. The specific numerical values in the definition of R are chosen to mimic the agedependency as reported in [10], where R was found to be about 3 and 1.5 at 60 and 90 years of age, respectively. Note, however, that in this simulation we want to demonstrate feasibility of the method in a realistic range of parameters. We do not aim for the best obtainable agreement between our input data and the observed data.
Bayes estimation and application to claims data
After describing the direct estimation, we present an estimation method in the framework of Bayesian inference. Bayes methods are increasingly used in applied statistics because they provide a flexible framework for the analysis of scientific problems and quantifying uncertainty in their solution [11]. As an application of the Bayesian approach, we estimate the excess mortality of type 2 diabetes in the year 2012 from claims data comprising 35 million German men. Goffrier and colleagues [12] reported the agespecific prevalence of diabetes among German men in the years t_{1} = 2009 and t_{2} = 2015 as shown in Fig. 3. In the same work, the agespecific incidence rate i in 2012 has been surveyed. The data for this analysis is publicly available and can be found in [12].
Our aim in the diabetes example is to estimate the agespecific mortality rate ratio R in the range 50 to 90 years of age. Recently, for a smaller age range the mortality rate ratio has been estimated in Tönnies et al. [13]. Compared to [13] we extend the age range by the novel Bayesian approach.
The idea for the Bayes method is that for given agespecific prevalence p, incidence rate i and general mortality m, an estimate of the excess mortality in terms of the mortality rate ratio R is desired. According to the Theorem of Bayes [11] we obtain.
where f(R p) is the posteriori distribution of R, f(pR) is the probability density function of p given R and f(R) denotes the priori distribution of R. For clarity, we assume that i and m are known. Motivated by empirical findings from the Danish Diabetes Register [14], we assume that the logarithm of the agespecific mortality rate ratio R approximately is a straight line in the age range 50 to 90 years:
For estimation of R(50) and R(90) in Eq. (2), we use weakly informative prior distributions R(50) ~ U(2; 9) and R(90) ~ U(1; 2); again inspired by the Danish diabetes register. U(v; w) means the continuous uniform distribution with minimum and maximum value v and w, respectively. In Bayesian terminology, our aim is to estimate the joint aposteriori distribution for R(50) and R(90).
To use Eq. (1) for the estimation of R given p, we apply three steps: 1) values for R(50) and R(90) are drawn from the uniform prior distributions, 2) solving the PDE with initial condition p(2009; a) as given in [12] and 3) comparing the calculated solution p in 2015 with the surveyed values.
For solving the PDE, we use the Method of Characteristics [15] to first convert the PDE into an ordinary differential equation (ODE) and then, second, solve the ODE by the RungeKutta Method of fourth order [16]. Next, the calculated prevalence in 2015, p(2015; a), is compared with the observed prevalence in 2015 given by [12]. The agespecific prevalences p in the years 2009 and 2015 are shown as black and blue lines in Fig. 3, respectively. As conditional distribution f(pR), we chose the multivariate normal distribution.
where p_{mod} = p_{mod}(R) is the solution of the PDE for a given R. The conditional distribution f(pR) assesses the differences between the modeled p_{mod} and observed prevalences p_{obs}. The covariance matrix Σ is estimated by following diagonal matrix:
with agespecific prevalences p_{j} and the corresponding number of people n_{j} in the age group j. Choosing the covariance matrix as a diagonal matrix makes the implicit assumption that the prevalences p_{j} are stochastically independent. A justification for this assumption is the fact that people belonging to one age group are different from the people in another age group.
In a sensitivity analysis, we released the assumption of weakly informative priors (R(50) ~ U(2; 9), R(90) ~ U(1; 2)) and examined the impact on the estimation of R(50) and R(90). For this, we choose R(50) and R(90) from a bivariate normal distribution with mean (5.5, 1.5), standard deviation of 1 and 0.1 in R(50) and R(90), respectively, and a correlation coefficient of 0.9 between R(50) and R(90). These assumptions lead to the following covariance matrix for the joint distribution of R(50) and R(90):
Results
Illnessdeath model
The agespecific prevalence p(t, a) = I(t, a)/{H(t, a) + I(t, a)} i.e., the percentage of people aged a at time t who are chronically ill, is the solution of the following partial differential equation (PDE):
In Eq. (3), ∂_{t} and ∂_{a} denote the partial derivatives with respect to t and a, respectively. The mathematical proof for Eq. (1) can be obtained from examining the change rates of the number of healthy and ill people in the illnessdeath model (H and I in Fig. 1) [17] or by using the theory of stochastic processes [2].
Eq. (3) implies that the excess mortality Δm = m_{1} – m_{0} can directly be estimated from the incidence rate i, prevalence p and the temporal change of the prevalence ((∂_{t} + ∂_{a})p):
Note that for direct estimation of the excess mortality Δm by Eq. (2) only the incidence rate i and the prevalence based figures p and (∂_{t} + ∂_{a}) p are necessary. No additional data are needed.
Instead of using Eq. (3) for a relation between the incidence, prevalence and mortality, an alternative way is possible by considering the prevalenceodds θ(t, a) = I(t, a)/H(t, a). For the prevalenceodds θ we find the following PDE, which is equivalent to Eq. (3):
Equation (5) was first published by Brunet and Struchiner [18]. The derivation is given in an additional file [Additional file 1]. Compared to Eq. (3) the PDE (5) has the advantage of being linear. Solving PDEs like Eq. (3) and (5) is usually accomplished by transformation into an equivalent ordinary differential equation by the Method of Characteristics [15]. In case of Eq. (3), the resulting ordinary differential equation is of Ricatti type [19], which in general can only be solved numerically because an explicit representation of the general solution does not exist [20]. In case of the equivalent Eq. (5), however, an explicit representation of the solution indeed is possible. As detailed in the additional file [see Additional file 1] it holds:
For brevity, in Eq. (6) it was set \( {\varphi}_{t,a}(x)={\int}_0^x\left[{m}_1{m}_0i\right]\left(tx+\tau, ax+\tau \right) d\tau \)
The explicit representation of the solution θ in Eq. (6) allows to calculate θ with any prescribed accuracy, e.g. by Romberg integration [16], which we will use in the examples below. Applying the backtransformation p = θ/(1 + θ) yields the prevalence p.
For later use, we note that Eq. (3) can also be expressed in terms of the mortality rate ratio R and the general mortality m = p m_{1} + (1 – p) m_{0}:
Direct estimation: dementia in the female population of Europe
After calculating the prevalenceodds θ in years t_{1} and t_{2} by Eq. (6), the associated prevalences p = θ/(1 + θ) are calculated. Figure 4 shows the agespecific prevalences for the years t_{1} = 1990 (dashed line) and t_{2} = 2010 (solid line). To demonstrate that our simulated prevalence has a reasonable range, we additionally plotted the surveyed values for European women reported in [8]. The proposed method to estimate the excess mortality Δm in the year t = 2000 is the direct application of Eq. (4). The partial derivative (∂_{t} + ∂_{a})p in Eq. (4) is approximated by a finite difference:
Then, the excess mortality Δm can be estimated by plugging these numbers into Eq. (4). In case the mortality rate m_{0} of the nondiseased is known, the agespecific mortality rate ratio can be calculated by R = 1 + Δm/m_{0}. Table 1 shows the true and estimated values for R at different ages and various choices of ΔT.
From Table 1 we can see that the absolute relative Error increases as the temporal difference ΔT between the crosssections increases and that absolute relative error increases as the age decreases. In the extreme case (age 60, ΔT = 10), the absolute relative error reaches nearly 30%. This indicates that in case two crosssectional studies are separated by more three years (i.e., ΔT > 3) the method yields feasible results only in the higher age groups.
Bayesian estimation of excess mortality in male diabetics from Germany
The loglikelihood of the aposteriori distribution f(Rp) ∝ f(pR) × f(R) is shown in Fig. 5. The black cross indicates the maximum aposteriori (MAP) estimator for these data, which is given by R_{MAP}(50) = 4.47 and R_{MAP}(90) = 1.39. We obtain the estimates for R(50) and R(90) including 95% credibility intervals as shown in Table 2.
These values agree well with the empirical findings from the Danish Diabetes Register [14], where values slightly below 4 and slightly above 1.5 have been found for ages 50 and 90 years, respectively.
In the sensitivity analysis with bivariate normal prior distributions, the MAP estimator changed only slightly to R_{MAP}(50) = 4.54 and R_{MAP}(90) = 1.38.
Discussion
In this work, we have described how the illnessdeath model can be used to obtain information about excess mortality in case prevalence and incidence are given. It turns out that the excess mortality can be calculated by the incidence rate, the prevalence and the temporal change of the prevalence (see Eq. (4)). In data where these figures are estimable, insights into the excess mortality of people with chronic diseases compared to the people without the disease can be gained.
As applications, simulated data about dementia and claims data about diabetes have been analyzed. For the dementia example we estimated the excess mortality directly and for the diabetes data we formulated a Bayesian approach. Both methods were based on aggregated data only (agespecific prevalence and incidence rate) and do not require data from individual subjects. Aggregated data can be found frequently in the literature, which makes the proposed method suitable for many applications, especially when the research question is aimed at populationwide measures. Here, we have chosen aggregated data about diabetes from the statutory health insurance in Germany based on about 35 million men. Based on the agespecific prevalence in 2009, we used noninformative priors for mortality rate ratio R and the PDE (7) to estimate the aposteriori likelihood of R given the agespecific prevalence in 2015. In this way, the PDE can therefore be seen as the data generating process underlying the prevalence data. In a sensitivity analysis, we used more informative prior distributions (bivariate normal) and found that the estimated values for the mortality rate ratios changed only slightly. Main reason for this robustness is due to the large number of people in the prevalence data.
Our approach has two limitations. The first limitation stems from the fact that Eqs. (3) and (5) are only valid if migration into and or from the considered population does not take place or if the prevalence of the chronic condition in migrants is similar to the prevalence in the resident population [21]. If migration happens on a considerable magnitude and if the prevalence in the migrants is substantially different from the residents, adoptions to Eq. (3) are possible [21]. The second limitation of our novel approach becomes visible in the simulation study about dementia: The two (or more) crosssectional surveys for estimating the change of the prevalence should not be separated too much. In our simulation, the surveys should be conducted within a period of three years (or less) (i.e., ΔT ≤ 3) to have a relative error below 5%. If the two crosssections are separated by ten years (ΔT = 10), the relative error has reached up to 30%. In the diabetes example, the two crosssections were separated by six years (ΔT = 6). Based on this, we expect the relative errors of our estimates R(50) and R(90) to be about 10%. For comparison, the width of the credibility intervals for our estimates R(50) and R(90) have a similar magnitude. Thus, we would conclude a relative error of 10% in the mortality rate ratio is a rough estimate of the magnitude of accuracy that can be obtained from our method applied to these data.
In the current analysis, no attempt has been taken to examine the effect of smaller population sizes, i.e., how sampling uncertainty in the agespecific prevalence and incidence affects the estimates of the excess mortality. Furthermore, we have not analyzed the robustness of the estimation methods against misclassification error (i.e., false positive and false negative rates in input prevalence and incidence data). Questions about sample sizes and misclassification are currently analyzed and will be subject to a future paper providing more technical details.
Conclusion
The described approach is useful to estimate the excess mortality of a chronic condition from aggregated incidence and prevalence data. The feasibility has been demonstrated in a simulation study about dementia and in claims data about diabetes in German men.
Abbreviations
 MAP:

Maximum aposteriori
 NHIS:

National Health Interview Survey
 PDE:

Partial differential equation
References
 1.
Brinks R, Landwehr S. Change rates and prevalence of a dichotomous variable: simulations and applications. PLoS One. 2015;10(3). https://doi.org/10.1371/journal.pone.0118955.
 2.
Brinks R, Hoyer A. Illnessdeath model: statistical perspective and differential equations. Lifetime Data Anal. 2018;24(4):743–54. https://doi.org/10.1007/s1098501894196.
 3.
Brinks R, Hoyer A, Landwehr S. Surveillance of the incidence of noncommunicable diseases (NCDs) with sparse resources: a simulation study using data from a National Diabetes Registry, Denmark, 1995–2004. PLoS One. 2016;11(3):e0152046. https://doi.org/10.1371/journal.pone.0152046.
 4.
Egeberg A, Kristensen LE. Impact of age and sex on the incidence and prevalence of psoriatic arthritis. Ann Rheum Dis. 2018;77:e19. https://doi.org/10.1136/annrheumdis2017211980.
 5.
Tamayo T, Brinks R, Hoyer A, Kuß OS, Rathmann W. The prevalence and incidence of diabetes in Germany. Dtsch Arztebl Int. 2016;113(11):177–82. https://doi.org/10.3238/arztebl.2016.0177.
 6.
National Center for Health Statistics of the Centers for Disease Control and Prevention (CDC) About the National Health Interview Survey https://www.cdc.gov/nchs/nhis/about_nhis.htm. Accessed on 5 Apr 2019.
 7.
Fratiglioni L, Launer LJ, Andersen K, Breteler MM, Copeland JR, Dartigues JF, Lobo A, MartinezLage J, Soininen H, Hofman A. Incidence of dementia and major subtypes in Europe: a collaborative study of populationbased cohorts. Neurologic diseases in the elderly research group. Neurology. 2000;54(11 Suppl 5):S10–5.
 8.
Lobo A, Launer LJ, Fratiglioni L, Andersen K, Di Carlo A, Breteler MM, Copeland JR, Dartigues JF, Jagger C, MartinezLage J, Soininen H, Hofman A. Prevalence of dementia and major subtypes in Europe: a collaborative study of populationbased cohorts. Neurologic Diseases in the Elderly Research Group. Neurology. 2000;54(11 Suppl 5):S4–9.
 9.
Missov TI, Lenart A. Gompertz–Makeham life expectancies: expressions and applications. Theor Pop Bio. 2013;90:29–35.
 10.
Rait G, Walters K, Bottomley C, Petersen I, Iliffe S, Nazareth I. Survival of people with clinical diagnosis of dementia in primary care: cohort study. BMJ. 2010;341:c3584. https://doi.org/10.1136/bmj.c3584.
 11.
Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. London: Chapman and Hall/CRC; 2013.
 12.
Goffrier B, Schulz M, BätzingFeigenbaum J. Administrative Prävalenzen und Inzidenzen des diabetes mellitus von 2009 bis 2015. Versorgungsatlas. 2017. https://doi.org/10.20364/VA17.03.
 13.
Tönnies T, Hoyer A, Brinks R. Excess mortality for people diagnosed with type 2 diabetes in 2012  estimates based on claims data from 70 million Germans. Nutr Metab Cardiovasc Dis. 2018;28(9):887–91. https://doi.org/10.1016/j.numecd.2018.05.008.
 14.
Carstensen B, Kristensen JK, Ottosen P, BorchJohnsen K. Steering Group of the National Diabetes Register. The Danish National Diabetes Register: trends in incidence, prevalence and mortality. Diabetologia. 2008;51(12):2187–96. https://doi.org/10.1007/s001250081156z.
 15.
Polyanin AD, Zaitsev VF, Moussiaux A. Handbook of firstorder partial differential equations: CRC Press; 2001.
 16.
Dahlquist G, Björck A. Numerical methods. Englewood Cliffs: PrenticeHall; 1974.
 17.
Brinks R, Landwehr S. A new relation between prevalence and incidence of a chronic disease. Mathematical Medicine and Biology. 2015;32(4):425–35. https://doi.org/10.1093/imammb/dqu024.
 18.
Brunet RC, Struchiner CJ. A nonparametric method for the reconstruction of ageand timedependent incidence from the prevalence data of irreversible diseases with differential mortality. Theor Pop Bio. 1999;56(1):76–90.
 19.
Brinks R. Illnessdeath model in chronic disease epidemiology: characteristics of a related differential equation and an inverse problem. Comp Math Meth Med. 2018. https://doi.org/10.1155/2018/5091096.
 20.
Kamke E. Differentialgleichungen Lösungsmethoden und Lösungen. Leipzig: Teubner Verlag; 1983.
 21.
Brinks R, Landwehr S. Age and timedependent model of the prevalence of noncommunicable diseases and application to dementia in Germany. Theor Popul Biol. 2014;92:62–8. https://doi.org/10.1016/j.tpb.2013.11.006.
Acknowledgements
The authors wish to thank the Zentralinstitut für Kassenärztliche Versorgung, Berlin, for making the claims data available.
Funding
This research did not receive any funding.
Author information
Affiliations
Contributions
RB had the initial idea for this work, developed the source code and drafted the manuscript. TT and AH critically discussed the ideas and revised the manuscript. All authors gave substantial intellectual contributions, read and approved the final manuscript.
Corresponding author
Correspondence to Ralph Brinks.
Ethics declarations
Ethics approval and consent to participate
This study does not involve data from human participants (simulation about dementia) or solely relies on publically available secondary data (aggregated claims data [12]). Therefore, consent to participate is not required. The Ethics Board of the University Hospital Duesseldorf has confirmed that in case of published data, no review of the Ethics Board is necessary.
Consent for publication
Not necessary because this manuscript does not contain data from any individual person.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 2:
Script (plain text file, accessible via any text editor, e.g., Notepad, GNU Emacs etc) for the dementia simulation study, intended to use with the statistical software R (The R Foundation of Statistical Software). (R 4 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Chronic disease epidemiology
 Multistate model
 Prevalence
 Incidence
 Dementia
 Diabetes
 Partial differential equation
 Bayes estimation