Probabilistic linkage was performed between the databases of an epidemiological survey on hypertension conducted in Ilha do Governador in the city of Rio de Janeiro, Brazil, and that of the Mortality Information System (Portuguese acronym: SIM - Sistema de Informações de Mortalidade) from 1991 to 2009 in the State of Rio de Janeiro.
The baseline study on arterial hypertension in Ilha do Governador
The baseline study was a survey conducted in Ilha do Governador between July 1991 and May 1992 and aimed to estimate the prevalence of hypertension and other cardiovascular risk factors in adults aged 20 years or older who were selected through a probabilistic sampling of households at three economic levels .
Local and population
The Ilha do Governador is a political-administrative region that comprise 15 districts among the 33 regions in which the city of Rio de Janeiro is subdivided.,It is an island located in the Baía de Guanabara, with 33.53 km2. According to the 1991 census its population accounted for 3.6 % (197,158) of the resident population of the municipality of Rio de Janeiro, including 129,474 adults aged 20 years or older (65.7 %), of which 60,244 (46.5 %) were men and 69,230 (53.5 %) were women. This area of study presents heterogeneous social and economic aspects; a demographic profile very similar to the city of Rio de Janeiro with health facilities of low, medium and high complexity.
The survey was conducted in a three stages cluster sample, from 187 census tracts that formed the Ilha do Governador. The sample was stratified according to the average household income of each sector: low-income, middle-income or high income. At the first stage ten census sectors in each of the three socioeconomic stratum (low, middle and high income) were selected by a systematic selection process with probability proportional to the size of each sector. In the second stage 25 households were randomly selected in each of the ten sectors selected at the first stage. In the third stage all subjects aged 20 years or more living in the selected households was selected to compose the final sample .
To estimate the sample size it was considered a prevalence 20 % of hypertension, with a sampling error of 2.5 % and a confidence interval of 95 % (bilateral alpha of 2.5 %), considering a reduction of heterogeneity by 1/3 due to clustering effect, resulting in approximately 1,500 individuals.
The same number of households was selected in each stratum, accounting for about 500 individuals in each one (an average of 2.46 adults per household according to census information). Assuming a loss of 10 % of residents in the selected households resulted in a required sample of 250 households in each stratum .
The overall effective sample of all three stratum was 674 households (89.9 % of the planned).
Trained interviewers applied an individual questionnaire on demographic information (age, gender, skin color, nationality and marital status); socioeconomic characteristics (education, occupation, labor relations and income); lifestyle (smoking, alcohol consumption, dietary and exercise); previous hypertension diagnosis and antihypertensive treatment and morbidity (history of previous diseases related to the cardiovascular, renal and respiratory systems, use of medications) and reproductive history of women (contraceptive use, pregnancies) .
Previously trained and supervised examiners also held measures such as weight and height, radial pulse, arm circumference and blood pressure at home, following a protocol to ensure accuracy and standardization of data collected.
Blood pressure levels were determined with a mercury sphygmomanometer designed to avoid measurement errors. Two blood pressure measures on the same visit were realized in the left arm, with the individual in a sitting position, with an interval of at least 20 min between each one. The second measurement was used to classify the individual as hypertensive or not.
Mortality Information System
The Health Ministry is responsible for the National Mortality Information System that contains data on mortality throughout the country. It has demographic information such as age, date of birth, gender, skin color, marital status, nationality and place of residence as well as information about death, such as, date and cause of death from the death certificate. Information about the accuracy of coding of death certificates can be found at Jorge et al. .
The database linkage was performed using the software Reclink 3 following the steps and parameters recommended by Camargo & Coeli . Reclink 3 is a system for database linkage based on the probabilistic record linkage technique, to match observations between two datasets where no perfect key fields exist. The method consists of standartization of databases and linking using blocking steps comparing the variables “Name/Date of Birth/Sex” to determine a cut-off score .
The method had a sensitivity of 85.5 %, specificity of 99.4 %, positive predictive of 98.1 % and negative predictive of 94.9 % using similar databases as in the present study .
The cardiovascular causes of death between 1991 and 1995 (obtained from the SIM database) were classified with codes 390 to 459, according to the 9th revision of the International Classification of Diseases (ICD-9), and with codes I00 to I99, according to chapter IX of ICD-10.
The characteristics of the studied population and the proportion of deaths associated with either cardiovascular or non-cardiovascular causes were analyzed. The prevalence of hypertension was estimated and adjusted according to the sampling design with 95 % confidence interval (95 % CI) using the statistical routines for complex samples, Survey (svy), in the program Stata 11.0.
Subjects presenting systolic blood pressure (SBP) lower than 140 mmHg and diastolic blood pressure (DBP) lower than 90 mmHg, without any antihypertensive treatment at the time of the study, were classified as non-hypertensive. Those subjects presenting SBP higher than or equal to 140 mmHg or DBP higher than or equal to 90 mmHg [2, 3] at the second blood pressure measurement as well as those presenting any blood pressure level while on antihypertensive treatment were classified as hypertensive. Treatment was considered as anti-hypertensive drug use.
We classified the hypertensives as:
Untreated hypertensives: high BP without any treatment at the time of the study.
Controlled hypertensives: SBP < 140 mmHg and DBP < 90 mmHg who were being treated at the time of the study
Uncontrolled hypertensives: high BP despite of treatment at the time of the study.
Subjects were considered cigarette smokers or alcohol users when they reported smoking or drinking alcoholic beverages, regardless of frequency. Leisure-time physical activity was classified as present when practiced at least once a week; occasional when practiced less than once per week; or absent when it was never practiced.
For the body mass index (BMI) criteria, subjects were categorized as normal weight with a BMI lower than 25 kg/m2 and as overweight with a BMI greater than or equal to 25 kg/m2.
Skin color was classified by the interviewer based on the observation of ethnic characteristics.
Income was categorized according to the stratification used in the study design of the Hypertension survey . The subjects’ educational levels were classified into three categories: Illiterate; Intermediate education (primary/secondary education - incomplete or complete primary education, complete secondary education, or incomplete higher education); and Superior education (higher education - completed higher education).
For the survival analysis, the time interval between the date of the subjects’ participation in the survey and the date of cardiovascular death was used. The cumulative probabilities of survival for subjects with hypertension and with cardiovascular disease risk factors were assessed by the Kaplan-Meier method  and considered statistically significant when p < 0.05 using the log-rank test  in the bivariate analysis.
The Cox proportional hazards model  and the Survey routine (Stata 11.0) were used for multivariate analysis. The risk factors with p < 0.20 obtained in the bivariate analysis and tested in the model as potential confounders. Regardless of statistical significance or impact on the HR (hazard ratio), the variables sex, age and educational level and were kept in the final model because they were associated with both exposure and outcome.
This study was approved by the Research Ethics Committee of the Institute of Studies in Public Health, Federal University of Rio de Janeiro (Instituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de Janeiro - IESC-UFRJ) on 15 March 2012 (process number: 6813). The mortality database with identification was released by the health state board of RJ after submission by researchers of a commitment term guaranteeing data security and confidentiality.