Identifying the determinants of premature mortality in Russia: overcoming a methodological challenge

Background It is thought that excessive alcohol consumption is related to the high mortality among working age men in Russia. Moreover it has been suggested that alcohol is a key proximate driver of the very sharp fluctuations in mortality seen in this group since the mid-1980s. Designing an individual-level study suitable to address the potential acute effects of alcohol consumption on mortality in Russia has posed a challenge to epidemiologists, especially because of the need to identify factors that could underlie the rapid changes up and down in mortality rates that have been such a distinctive feature of the Russian mortality crisis. In order to address this study question which focuses on exposures acting shortly before sudden death, a cohort would be unfeasibly large and would suffer from recruitment bias. Methods Although the situation in Russia is unusual, with a very high death rate characterised by many sudden and apparently unexpected deaths in young men, the methodological problem is common to research on any cause of death where many deaths are sudden. Results We describe the development of an innovative approach that has overcome some of these challenges: a case-control study employing proxy informants and external data sources to collect information about proximate determinants of mortality. Conclusion This offers a set of principles that can be adopted by epidemiologists studying sudden and unexpected deaths in other settings.


Background
It is thought that excessive alcohol consumption is related to the high mortality among working age men in Russia. Specifically, it has been suggested that alcohol is a key proximate driver of the very sharp fluctuations in mortal-ity seen in this group since the mid-1980s including a seven year period when, on average, male life expectancy at birth fell by more than one year for each calendar year [1][2][3][4]. Most of the evidence comes from the analysis of routine data and from cross-sectional surveys of alcohol consumption, while in contrast there have been relatively few studies linking the drinking behaviour of individuals to mortality in Russia. Extrapolating the alcohol-mortality relationships found in research undertaken in Western countries to Russia is likely to be inappropriate because, along with many other parts of the former Soviet Union, it has a particular pattern of drinking characterised by binge-drinking of large volumes of spirits on one occasion, a pattern that is believed to be particularly dangerous [5].

The challenge
Designing an individual-level study to investigate these issues raises many methodological challenges. The initial hypothesis is that the sharp fluctuations in working-age mortality that have characterised Russian demographic trends since the mid-1980s may be due to acute or very short-term effects of alcohol consumption. An individual's pattern of drinking may also change considerably over a short period of time. Thus, what is required is a design for collecting valid information about individual characteristics and drinking behaviours that can then be related to the risk of death in the following 12 or 18 months.
The two classic analytic study designs are cohort and casecontrol studies. However, in their conventional forms, neither design is ideal to address this problem. Cohort studies typically measure exposure at a base-line examination or interview and then follow up the study subjects. If mortality is the endpoint, follow-up is usually for years. This design is particularly suited to studying mortality effects of relatively stable patterns of exposure, such as smoking, or acute or episodic exposures which elevate mortality over an appreciable period of time such as exposure to radiation and subsequent risk of solid tumours. In the case of short-term mortality effects of acute and intrinsically unstable patterns of behaviour including binge drinking, a conventional cohort design is not optimal. As follow-up time accumulates, classification of people according to baseline reports of frequency and intensity of binge drinking becomes increasingly uninformative about exposures that may be occurring shortly before death. One possible solution to this would be to have a cohort study of such size that sufficient numbers of events would accrue within a short period of follow-up. In most instances the huge size of the required cohort would make this unfeasible. Another alternative would be to undertake regular resurveys of the cohort to update exposure information about drinking patterns, although this would be logistically difficult, expensive and likely to result in selective loss to follow-up of heavy drinkers as discussed below.
Some cohort studies suffer from another problem in terms of their suitability for studying the mortality effects of heavy drinking, or other outcomes associated with behavioural problems and social dysfunction. This is the selective exclusion of those at the ends of the spectrum, such as heavy or problem drinkers [6]. In a study which relied on regular resurveys, those individuals who became heavier/ more problematic drinkers since being captured at baseline would be less likely to take part in resurveys than others. Additionally, studies for which recruitment requires subjects to travel from their home address, to an appointment in a clinic for examination for example, are also likely to exclude problem drinkers differentially. This is a well recognised phenomenon in cross-sectional studies, but may be particularly serious when cohort recruitment requires considerable commitment from the potential participants. In the Russian context, two of the larger and well known cohorts (the Lipid Research Clinics and the Novosibirsk cohorts) [7,8] both required subjects to attend for a medical examination as part of the recruitment protocol. These studies are likely to have differentially excluded people who have serious alcohol problems at the time of recruitment.
In contrast, a case-control study in this context would focus upon obtaining information on the behaviours and characteristics of subjects in the most recent 12 month period, which for the cases (deaths) would be the year before death. From this perspective, the case-control design is more suitable than a cohort design. Certainly, retrospectively collected information about the immediate past might be regarded as more reliable and accurate than information about behaviours and characteristics much further back in time, as is often attempted in casecontrol studies.
If a case-control design is chosen it will require a departure from the usual approaches as, obviously, it must use proxy informants. Examples of such studies exist in the literature, particularly in investigations of risk factors for suicide [9,10] and violent death [11,12]. Other risk factors including drug use, smoking [13][14][15]and dietary factors [16] have also been explored using case-control designs with proxy informants. There is less literature on differential response bias between cases and controls based on proxy interviews compared to when the index cases and controls are themselves interviewed [17][18][19], and it is almost impossible to evaluate the extent of such bias when cases have died.
A case-control study in which the cases are dead does, however, have an advantage compared to the majority of studies where cases are of diagnosed disease. In a population with a well-functioning vital registration system, 100% case-ascertainment is feasible. Moreover even if detailed proxy-based information is not available for all, there will usually be sufficient routine data collected at death registration to assess the representativeness of the cases with full information. If the target group of cases comprises all deaths occurring in a defined population, identifying the appropriate control sampling frame is straightforward: it is the whole population of the area in which the deaths are from.
Our interest in alcohol as the main exposure presents problems for case-control studies as it does for cohorts, particularly if, as is the case here, we wish to look at the role of heavy drinking. As already mentioned above crosssectional surveys are likely to fail to interview people who drink heavily differentially. In terms of actual methods, a case-control study is very similar to a cross-sectional study in that one approaches people to be interviewed without a history of prior contact and engagement in the research. However, the obligatory use of proxies may actually help in one respect -in that the drinking behaviours of the cases and controls may be less strongly linked to the process of recruitment of proxies than they are to recruitment of the index subjects themselves.
In the rest of this paper we describe the design and implementation of an innovative study in Russia that illustrates how these challenges and issues can be addressed. Although some aspects of the design reflect the specific situation in Russia, we believe that it offers a set of principles that can usefully be adopted by epidemiologists studying sudden and unexpected deaths in other settings.

The Izhevsk Family Case-control study
The Izhevsk Family Study was established to investigate whether patterns of alcohol consumption were linked to short-term risk of death among Russian men of working age in a typical, medium-sized Russian industrial town, Izhevsk. As already highlighted, the key challenge was to design a study which was able to capture information about exposures hypothesised to lead to mortality in the period immediately prior to death and to obtain a representative set of living controls with which to compare them.
Izhevsk has a population of 650 000 people and is located on the western side of the Ural mountains. It is the capital of the Udmurt Republic, one of the 89 territories that make up the Russian Federation. The leading causes of death among these men were cardiovascular disease, a high proportion of which were sudden, and external causes. While the focus was on alcohol, clearly it was important to be able to collect information on other factors that might act at different points on the causal pathway, such as unemployment, or act as confounders, such as smoking.
The Izhevsk Family study is not the first study to address the issue of premature mortality among Russian men of working age. The study benefited from an earlier study of 1998-99, also based in the Udmurt Republic, that attempted to address the same issue [4,20]. The previous study used broadly the same design, and found an association between excessive alcohol consumption and premature mortality (below age 55 years) in Russian men. However, this previous study had several weaknesses which threw these findings into question. For example, despite including several questions about alcohol use, the information was limited and inadequate; it included too many subjective/evaluative questions, to which proxies could not reliably respond; no checks were made regarding whether the cases belonged to the sampling frame for the controls which could have causes selection bias; no data was collected from external, objective sources, nor were the controls themselves interviewed, so there was no opportunity to validate proxy-obtained data. In comparison, the Izhevsk Family study was strengthened in a number of important respects. Firstly, the questionnaire was improved, excluded subjective/evaluative questions and incorporated more questions on clearly observable behaviours, so as to obtain more valid responses from proxy respondents. Secondly, the questionnaire included an extended range of questions on alcohol consumption and alcohol-related behaviour, including questions about surrogates (manufactured alcohol-containing substances not intended for drinking), the consumption of which subsequently emerged as particularly important. Thirdly, the Izhevsk Family study carried out interviews with both controls and their proxies, and in addition previously collected administrative and clinical data on the subjects was obtained, providing opportunity for validation of responses obtained from proxy respondents. Finally, it was possible to identify which cases belonged to the sampling frame from which controls were drawn, and hence sensitivity analyses could be conducted.

Study design
Given the above considerations, a population-based case control design was chosen. Ethical approval for the study was obtained from the committees of the Izhevsk Medical Academy and the London School of Hygiene & Tropical Medicine. Identification of deaths from all causes (cases) occurring over a two-year period (2003-5) was straightforward, as all deaths were registered with the city vital registration bureau (ZAGS). Controls could be selected from computerised voters lists and as these include not only name, address, and sex, but also date of birth, it was possible to frequency-match to the cases by age group. A total of 1750 case proxies and 1750 control proxies were recruited and interviewed. This represented an overall response rate of 62% among cases (1750/2835 total identified deaths), and 57% among controls (1750/3078 con-trol household approached). Detailed information on recruitment and response rates has been previously reported [21]. Table 1, below, shows the distribution of cases and controls by age group.

Obtaining information on subjects
Since cases were men who had died, information about them was obtained by interviewing proxy informants. Proxies were also interviewed for all controls, in order to use the same method of data collection. The use of proxies, however, raises issues of the validity of the information they provide about the index subject. At the design stage this was addressed by means of a systematic review of studies that had looked at the extent of agreement between subjects and proxies [22]. The review concluded that, for most of the factors in which we were interested and in particular, alcohol and tobacco consumption, there was evidence from the literature that levels of agreement would be acceptable provided questions were carefully constructed to focus upon index behaviours that the proxy would be able to directly observe. Therefore, in addition to conventional quantity-frequency questions about alcohol consumption, we also developed and used questions which acted as markers of hazardous alcohol consumption, including behaviours such as the frequency with which the subject had fallen asleep with his clothes on because they were drunk, and how often the subject had had a hangover or had been excessively drunk.
The review also indicated that those closest to the index case, typically spouses, tended to provide the most reliable and valid information about the index subject. Thus, selection of controls involved identification of the 'best informant', defined as someone who had been living with the index for unbroken period of at least 6 months at the current time/at the time of death (for cases). Where there was more than one potential informant with comparable duration of co-residence, the individual was selected according to their relationship to the index, in the order prescribed by the findings of the systematic review: wife/ girlfriend/partner, sister, mother, brother, father, off-spring, other, as shown in Table 2. The questionnaire included a section about the proxy's own socio-demographic characteristics, since the choice of proxy can influence the validity and reliability of data obtained. Where the index lived alone they were excluded by necessity, which affects the generalisability of the results. If, however, the index had recently moved out of his permanent residence to a new address without the rest of his household or the potential informants had recently moved away from, or in to, the index's permanent residence and had therefore not lived with the index continuously for the previous 6 months, interviewers followed a series of detailed steps to ensure a consistent proxy selection process. In only 15 case households and 4 control households, interviews were carried out with a proxy who had not lived with the index for an unbroken period of 6 months, illustrating that this protocol was successfully followed by the fieldworkers.
The distribution of proxies for cases and controls differs, whereby more spouses were available for controls, whilst mothers tended to be the best available proxies for cases. This reflects the inherent difference in marital status and household composition between these two groups. Whilst it is possible that this introduced some bias into the sample due to the possible differences in reporting by different proxy types, provided the proxy fulfils certain criteria related to their ability to reliably report on the index, this will be minimal [22]. Analyses restricted to spouses only obtained very similar results to those which were unrestricted according to proxy type [21] External validation In our study, given its focus on hazardous drinking and its potential antecedents, it was desirable to obtain independent information on this issue. This was considered to be particularly relevant to assessing a) differential reporting by proxies by case-control status; b) representativeness  of the cases and controls for whom proxy interviews were obtained. Direct evidence of a history of alcohol abuse was obtained from records of treatment in the city Narcology Dispensary (the alcohol and drug treatment clinic). In addition, external information was obtained from two other sources: evidence of any prison stays was obtained from police records; evidence of current benefits obtained and disability status were collected from the Social Security bureau. These made it possible to go well beyond the usual socio-demographic characteristics used when testing for recall bias. Our use of external data is illustrated by the example in Table 3 which shows the association between registration at the Narcology Dispensary and one indicator of hazardous drinking, consumption of surrogates, among cases and controls. Use of external data also provided an opportunity to fill an important gap in the literature on proxy informants by asking not just whether indexes and proxies agreed but also which were more accurate and, exceptionally, doing so in relation to issues that are often considered stigmatising and therefore subject to biased informant response [22].
All three sources of data are administrative, so are not prone to the subjective biases in questionnaire responses.
Since information was recorded prior to death, it could not have been affected by the fact of death itself. Further information on these sources of data is detailed in Table 4.

Information bias
Problems with validity of reporting behaviours are well documented, especially when the respondents are proxies, and the behaviours of interest are culturally sensitive, as alcohol consumption may be considered to be. As described above, steps were taken to quantify the extent of any information bias by comparing questionnaire responses with external sources of information related to reported behaviours, such as registration at the Narcology Dispensary and alcohol consumption. Given that information on cases was obtained from proxies, it was important, for consistency, that proxies were also used to elicit information on controls. However, since the controls themselves could also be interviewed, it was possible to extend the existing literature on proxy informants by conducting interviews with them and comparing their own responses to those of their proxies [22]. This is illustrated by Table 5 which shows the index-proxy agreement for frequency of consuming spirits in a sample of 1564 control households for which complete information was available. In addition, interviews conducted with an additional proxy in a sample of 200 case and 200 control households were a particularly valuable resource in terms of being able to investigate whether it was characteristics of the proxy, or characteristics of the household in which the index lived which affected the response obtained from the proxy. This is illustrated by Table 6, which shows the high proxy-proxy agreement for a question on current smoking status of the index in a sample of 131 case and 178 control households.

Selection bias
A challenge every case-control study must overcome is ensuring that the cases and controls are drawn from the same underlying population. Since these two groups are often recruited through different routes, it can be difficult to be certain that case-control selection bias is not affecting analyses. In order to address this issue, within the Izhevsk Family Study, analyses were repeated with a restricted sample which comprised only the cases (who had been originally identified via ZAGS) successfully located within the electoral register -the sampling frame used to select controls.
It must be noted that whilst we have argued that a casecontrol approach is more appropriate to answer a study question such as this than a cohort design, the difficulties of such research also render this far from ideal. This is illustrated by the difficulties encountered in recruiting cases. Of a total of 2837 deaths in men aged 25-54 in Izhevsk in the period during which this research was carried out that were eligible for inclusion, almost 15% (415) lived alone and could not be included in this study, 10% (284) met with respondent refusal, and the correct address could not be identified for 13% (378). This means that our analyses were finally based on around 62% of eligible deaths, and there are likely to be some specific biases present, although the same percentage of cases and controls not interviewed were registered at the Narcology Dispensary [21]. Despite this, the profile of mortality by *chi-squared test for a general association between reported surrogate consumption and registration at the Narcology Dispensary cause in the sample analysed is similar to that for all men living in Izhevsk, all men living in the Udmurt Republic, and all men living in Russia in the age group of interest, as illustrated by Figure 1.

Additional information on case characteristics
While first level analyses focused on deaths from all causes, it was clearly important to be able to look at specific causes of death as entered on the death certificate by the certifying doctor, or where an autopsy was conducted, from the outcome of this. In Russia all deaths are registered and, most of deaths in the age group of interest (25-54) undergo autopsy, with most being forensic. Cause of death is coded using the 10 th revision of the International Classification of Diseases. Yet death certificates provide only limited information and may not include information on pathological features of interest, such as alcoholic liver disease, where this was not directly associated with death. Hence, a structured proforma was designed to collect detailed data obtained at autopsy, including blood alcohol levels. This and other data regarding the circumstances of death provided an opportunity for additional population-based analyses concerning the problem of premature mortality in Russia, including analysis of blood-alcohol concentration at time of death or specific cause of death, and enabled assessment of the validity of the certified cause of death. Whilst some such analyses are separate from the main case-control analysis, they enrich the data, providing important additional information and the prospect of validation of questionnaire-obtained data

Conclusion
To address the methodological challenge outlined in this paper, we adopted a population-based case-control design. We started with all deaths in men aged 25-54 (and a representative sample of live controls) from an Weighted Cohen's kappa coefficient for agreement between 1564 index-proxy pairs = 0.61 entire defined population. We were therefore able to assess whether or not the cases and controls for whom we could not obtain an interview were representative of the target population. A great strength of this study was therefore that it was not subject to the same selection biases typically encountered by this type of research.
Although based on a conventional case control study, this study has three unusual features. First, although information could not be obtained from cases, it could be obtained from proxy informants providing certain precautions were taken. It was important to select proxies who had sufficient knowledge of the index to be able to report on their characteristics and behaviours, and to pose questions that addressed characteristics and behaviours that were observable. Second, given the need to ensure representativeness of controls and tackle recall bias, external sources of objective data relevant to the exposures being studied but blind to case/control status were used. Third, to overcome potential problems in defining outcomes, a detailed system of data collection to validate and extend information on causes of death was employed. Although adapted to the specific circumstances of Russia, we believe that the principles that we have set out here will be helpful to others seeking a better understanding of the causes of conditions associated with sudden and unexpected death in a range of situations, including deaths at young and middle ages, or deaths from certain causes such as violence or accidents, or tuberculosis. Moreover, the study illustrates that with sufficient attention given to issues of data quality and bias at the design stage, ancillary information can be collected which can be used to assess and demonstrate the validity of data collected using methods that might otherwise be regarded as questionable.

Competing interests
The author(s) declare that they have no competing interests.

Authors' contributions
The idea of the study was developed by VS with input from DL and MM. All authors contributed to the detailed design of the protocol. LS was responsible for the conduct of the interview field work, NK for coordination of all other data collection in Izhevsk, EA for data capture systems and monitoring of field work progress, ST for the detailed design of the questionnaire (including the systematic review of proxy validity) and coordination of the project and management of the final data set and MM for the autopsy and setting up of the surrogate toxicological analyses. ST drafted the text which was commented upon by all authors. All authors read and approved the final manuscript.
Distribution of deaths by ICD10 chapter among the sample analysed, men living in Izhevsk, men living in Udmurtia and men living in Russia aged 25-54 Figure 1 Distribution of deaths by ICD10 chapter among the sample analysed, men living in Izhevsk, men living in Udmurtia and men living in Russia aged 25-54.