RiskDiff: a web tool for the analysis of the difference due to risk and demographic factors for incidence or mortality data
© Valls et al; licensee BioMed Central Ltd. 2009
Received: 16 July 2009
Accepted: 18 December 2009
Published: 18 December 2009
Analysing the observed differences for incidence or mortality of a particular disease between two different situations (such as time points, geographical areas, gender or other social characteristics) can be useful both for scientific or administrative purposes. From an epidemiological and public health point of view, it is of great interest to assess the effect of demographic factors in these observed differences in order to elucidate the effect of the risk of developing a disease or dying from it. The method proposed by Bashir and Estève, which splits the observed variation into three components: risk, population structure and population size is a common choice at practice.
A web-based application, called RiskDiff has been implemented (available at http://rht.iconcologia.net/riskdiff.htm), to perform this kind of statistical analyses, providing text and graphical summaries. Code from the implemented functions in R is also provided. An application to cancer mortality data from Catalonia is used for illustration.
Combining epidemiological with demographical factors is crucial for analysing incidence or mortality from a disease, especially if the population pyramids show substantial differences. The tool implemented may serve to promote and divulgate the use of this method to give advice for epidemiologic interpretation and decision making in public health.
The analysis of the observed differences in the incidence or mortality of a given disease can be of great interest both for scientific and administrative purposes . Studies frequently focus on comparing the number of incident or deceased cases in two given situations, with the aim of quantifying the differences observed, for further epidemiological interpretations and to give advice for decision making in public health. In this situation, time trends are usually performed to study the historical evolution of risk and to assess the occurrence of a disease in a certain period of time , such as comparing two different time points, in a simple trend analysis. In the same line, geographical variation of risk of a disease can be evaluated by comparing incidence or mortality rates between two areas. These comparisons are usually reported with the absolute difference in the observed number of cases (incidents or deaths) or by using the difference in the crude rates (usually per 100 000 persons), and sometimes the percentage of change is also computed . Although crude rates can be used to compare different diseases in the same population, they are not useful for comparing rates of the same disease in different populations or over time . To overcome this, standardized measures of risk are used to compare the evolution of risk , using a common reference population (the world standard population is a common choice [5, 6]), and, the percentage of change of the disease is then computed . However, these changes could be partially attributed to the effect of demographic factors and not only to risk, especially if the population pyramids involved in the two situations show substantial differences. A variation of the population size over time could explain variation of the number of cases, due to the consequent increment (or decrement) of persons at risk to develop or die from a certain disease. In addition, changes in the age structure between the populations involved could also lead to substantial changes in the number of cases. Regarding to this, in a number of diseases such as cancer, ageing is known to be clearly associated with molecular, cellular and physiological changes that influence carcinogenesis and subsequent cancer growth , and, therefore, an increase of cases among the oldest age-groups is expected . In addition, another situation can arise when migration flows lead to changes in the population structure. For example, recently an increase of measles cases in Catalonia was reported, which has been partially attributed to immigration coming from undeveloped countries with poor measles vaccination coverage .
Bashir and Estève developed a method for partitioning the variation in the incidence or mortality from a disease between two groups, quantifying the percentage of change attributable to demographic factors (population size and structure) with respect to that which could be attributed to changes in the risk of developing or dying from a particular disease . The method is based on the idea of first computing the incidence or mortality that one would have observed if the population size and structure were the same for both groups, and secondly attributing this difference with respect to the net change to demographic factors. In addition, the change attributed to demographic factors can then itself be split into that due to variation in population size and that due to changes in the population structure . Thus, this method can evaluate differences in mortality (or incidence) data due to risk and demographic factors, which is not possible directly using standardized mortality (or incidence) data, since the reference population is a common standard and differences could only be attributed to risk.
The main aim of this paper is to present a set of functions in R code , that we have implemented, based on the method proposed by Bashir and Estève. These functions also provide convenient tables and graphical representations. In order to make these functions more widely available, we have implemented a web tool, called RiskDiff (publicly available at http://rht.iconcologia.net/riskdiff.htm) where the users can easily perform their analysis. Code for R functions is also freely available on the same web page.
Finally, to illustrate the use of this web tool, we analyse the differences in the number of deceased individuals from cancer in Catalonia in 1985 with respect to 2004, through a long period of 20 years, which is quite relevant from an epidemiological point of view.
Where S 1 and S 2 are the crude rates (per 100 000 people) for the baseline and comparison group respectively and S 1 is an intermediate rate obtained for the baseline group but using the comparison group as reference population. Thus, represents the proportional change between the observed rates in two groups, which is then partitioned in the proportional change due to population structure and the proportional change due to differences in risk .
Two functions have been implemented in R . The first one, risk.diff() needs four parameters called cases.init, cases.end, pop.init and pop.end which are vectors of the same length that contain the number of cases (or deaths) and the population for the two groups, for each age-group. As a result this function provides two tables that summarize the difference observed between the groups involved and a short text to facilitate interpretation. The second one, plot.risk.diff(), generates a graphical representation from the obtained results. These functions are available as a source text file and some examples of use are also provided. The implementation of these functions in a web interface has been made using PHP programming language . Functions are executed on a remote Linux server, and results are provided on-line.
For the example illustrated in this paper, we have used cancer mortality data for the period 1985-2004 provided from the Catalan Mortality Registry. In 1985, the Catalan population was about 6 million people and near to 7 million in 2004. Population pyramids have been provided by the Catalan Statistical Institute . The number of cancer deaths and the population at risk have been grouped in 5-year age bands. Registered deaths from all cancer location sites are included except those from non-skin melanoma (C44 as coded by ICD-10 ).
Number of observed deaths from cancer (all sites except non-skin melanoma) and population for years 1985 and 2004, for women in Catalonia.
Age in years
Number of observed deaths from cancer (all sites except non-skin melanoma) and population for years 1985 and 2004, for men in Catalonia.
Age in years
Regarding the changes in the observed cancer mortality in Catalonia, a relatively high increment of both the number of deaths and crude rate is observed through the period 1985 to 2004. However, more thorough analysis reveals that the risk of dying from cancer has experienced a clear decline. More precisely, for women, the net change in the crude rate was 17 deaths per 100 000 person-years (from 151 to 168), representing an increment of 11.02%. However, our results indicate a decrease of 31.55 deaths per 100 000 person-years (21%) attributable to changes in risk while an increment of 48.19 deaths per 100 000 person-years (32%) was due to changes in population structure, i.e. ageing of the Catalan population. In terms of the absolute number of deaths, the net change was of 1088 deaths (from 4629 to 5717), representing an increment of 23.5%. In the same line, this can be partitioned into that due to an increase of the population size (577.67 deaths, 12%), that due to the ageing of the population (1477.41 deaths, 32% ) and that due to risk, which represent a decrement of 967.08 deaths (21%). Analogously, for men the net change in the crude rate was 63 deaths per 100 000 person-years (from 225 to 288), representing an increment of 27.8%. Similar to that of women, a decrement of 8.77 deaths per 100 000 person-years (4%) was attributable to changes in risk while an increment of 71.44 deaths per 100 000 person-years (32%) was due to changes in population structure. In terms of the absolute number of deaths, the net change was 2918 deaths (from 6632 to 9550), representing an increment of 44%. Once again, this can be partitioned into that due to an increase of the population size (1073.32 deaths, 16%), that due to the ageing of the population (2102.76 deaths, 32% ) and that due to risk, which represents a decrement of 258.09 deaths (4%).
Discussion and conclusions
Evaluating the differences in the incidence or mortality of a disease in two given situations (such as time points, geographical areas or males versus females) without adjusting by the populations at risk involved, could lead to incorrect results . Thus, it is necessary to take into account demographic factors, i.e. population size and population structure, in order to more precisely attribute which part of the observed changes is due to risk. The method presented by Bashir < Estève  is a good solution and a common choice at practice. This work presents a publicly available web tool that performs this analysis and provides graphical summaries and tables, with the intention of contributing to the divulgation of the method and to promote its use in epidemiology and public health sciences, which may contribute to its use at an applied level.
The results obtained from the analysis of the Catalan cancer mortality are useful to illustrate the method and its application. Thus, these results give an appropriate example that supports the importance of taking into account the changes in the population, since a simple analysis would have concluded that there was an increment in the mortality for cancer in Catalonia of 23% for women and 44% for men throughout the 20 year period analysed (1985 to 2004), however by using this method it can be stated that, actually the risk of dying from cancer has been reduced by 20% for women and 4% for men, and the major reasons for the apparent net increment was the increase of the population size (12% for women and 16% for men) and ageing of society (32% for both women and men). Thus, the use of this method is highly recommended when comparing data from heterogeneous populations, which is translated into large variability between them. The effect of immigration on the assessment of risk when comparing two time periods could be ascertained through this method, as it is the case of Catalonia [13, 16, 17]. Similar to other regions in Europe, the stated decline of the mortality from cancer in Catalonia in this period may be due to a number of factors such as advances in cancer treatment and diagnostic techniques as well as the decrease in the prevalence of smoking habits , which is somewhat similar to other regions in Europe.
Regarding statistical issues, the method developed by Bashir < Estève  does not consider specific methods for assessing whether the observed differences are significant or not, so that it is unclear how this type of hypothesis could be tested. Thus, RiskDiff has to be considered just as a tool for describing mortality or incidence data. In the case of a population-based register the differences observed can be considered as the true ones and, therefore, the differences described directly refer to the differences in the population. However, in the case of sampling a general population, these differences have to be taken with caution. In a future, a non parametric procedure, such a bootstrap one, could be implemented to RiskDiff, so that a confidence interval for the observed differences could then be provided.
In conclusion, analysing incidence or mortality data without taking into account demographic effects, can lead to results that are not easily usable for policy making. In this situation, data on the absolute number of cases and demographic determinants is highly relevant for planning purposes and for assessing future needs. This work supports the idea of combining epidemiology with demography when performing statistical analysis on the incidence or mortality from a disease, especially in dynamic populations that are affected also by other risk factors as well, that may also vary across time, gender or geographic regions.
Availability and requirements
Project name: RiskDiff
Project home page: The webtool can be used through the following website, http://rht.iconcologia.net/riskdiff.htm. In addition, files for the R functions and examples of use can be are available as supplementary material (Additional file 1) and can also be downloaded from the web site.
Operating system: Platform independent for accessing the public web server
Programming language: R and PHP
Requirement: R statistical software available at http://www.r-project.org/ is required for the functions implemented.
Any restriction to use by non-academics: None
- Hakulinen T, Hakama M: Predictions of epidemiology and the evaluation of cancer control measures and the setting of policy priorities. Soc Sci Med. 1991, 33 (12): 1379-1383. 10.1016/0277-9536(91)90282-H.View ArticlePubMedGoogle Scholar
- Esteve J, Benhamou E, Raymond L: Statistical methods in cancer research. Volume IV. Descriptive epidemiology. IARC Sci Publ. 1994, 1-302. 128
- Breslow NE, Day NE: Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. IARC Sci Publ. 1987, 1-406. 82
- Sasieni PD, Adams J: Standardized lifetime risk. Am J Epidemiol. 1999, 149 (9): 869-875.View ArticlePubMedGoogle Scholar
- Segi M: Cancer mortality fort selected sites in 24 countries (1950-57). 1960, Sendai: Tohoku University School of Public HealthGoogle Scholar
- Doll R, Payne P, Waterhouse J: Cancer incidence in five continents: a technical report. 1966, Berlin: Springer-VerlagView ArticleGoogle Scholar
- Balducci L, Ershler WB: Cancer and ageing: a nexus at several levels. Nat Rev Cancer. 2005, 5 (8): 655-662. 10.1038/nrc1675.View ArticlePubMedGoogle Scholar
- Bray F, Moller B: Predicting the future burden of cancer. Nat Rev Cancer. 2006, 6 (1): 63-74. 10.1038/nrc1781.View ArticlePubMedGoogle Scholar
- Dominguez A, Torner N, Barrabeig I, Rovira A, Rius C, Cayla J, Plasencia E, Minguell S, Sala MR, Martinez A, et al: Large outbreak of measles in a community with high vaccination coverage: implications for the vaccination schedule. Clin Infect Dis. 2008, 47 (9): 1143-1149. 10.1086/592258.View ArticlePubMedGoogle Scholar
- Bashir S, Esteve J: Analysing the difference due to risk and demographic factors for incidence or mortality. Int J Epidemiol. 2000, 29 (5): 878-884. 10.1093/ije/29.5.878.View ArticlePubMedGoogle Scholar
- R Development Core Team: R: a language and environment for statistical computing. 2007, Viena, Austria: R Foundation for Statistical ComputingGoogle Scholar
- PHP Hypertext preprocessor. [http://www.php.net]
- Catalan Statistical Institute. [http://www.idescat.cat]
- World Health Organization: International Statistical Classification of Diseases and Related Health Problems 10th Revision, Geneva. 2007Google Scholar
- Ribes J, Cleries R, Buxo M, Ameijide A, Valls J, Gispert R: Predictions of cancer incidence and mortality in Catalonia to 2015 by means of Bayesian models. Med Clin. 2008, 131 (Suppl): 32-41.View ArticleGoogle Scholar
- Cabré A, Domingo A: Demografia i immigració, 1991-2005. Papers de demografia. 2007, 324: 1-32.Google Scholar
- Non-national populations in the EU Member States. [http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-NK-06-008/EN/KS-NK-06-008-EN.PDF]
- Gispert R, Clèries R, Puigdefàbregas A, Freitas A, Esteban L, Ribes J: Cancer mortality trends in Catalonia, 1985-2004. Med Clin. 2008, 131 (Suppl): 25-31.View ArticleGoogle Scholar
- Karim-Kos HE, de Vries E, Soerjomataram I, Lemmens V, Siesling S, Coebergh JW: Recent trends of cancer in Europe: a combined approach of incidence, survival and mortality for 17 cancer sites since the 1990s. Eur J Cancer. 2008, 44 (10): 1345-1389. 10.1016/j.ejca.2007.12.015.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2458/9/473/prepub