The CMDR database
The CMDR collects data on homeless deaths in order to implement the means and actions needed for research and the reporting of violent causes of homeless deaths (street homeless and those living in homeless shelters), to ensure dignified burial, and accompany the people in mourning. The information collected is mainly based on various informal reporting circuits: associations addressing to homeless, institutions, relatives and media. Associations, institutions or relatives could report information about the death by filling a form, available on the CMDR website, or by informal means.
In cases where the CMDR learn the death by media or when data are missing, the CMDR contacts services addressing to homeless people in order to learn more about the death. From January 2008 to December 2010, 1145 homeless deaths were registered in this database (source A).
The CepiDc database
The death certificate is composed of two parts: administrative and medical parts. INSEE manages the administrative part that contains the name of the dead person and information relating to civil status. CépiDc manages the non-nominative medical part which contains the medical causes of death, dates of birth and death, and cities of residence and death. From January 2008 to December 2010, CépiDc registered and coded a total of 1.6 million deaths for the entire population of France. For 241 of the deaths, 'homelessness' (street homeless and those living in homeless shelters) was reported (source B). This type of information is coded using the International Classification of Diseases (10th revision - ICD-10) as 'Z59.0', in the section 'Factors influencing health status and contact with health services'. According to the World Health Organization (WHO) guidelines, medical certifiers are to report homelessness if they consider that it contributed to a person’s death.
Information available in both databases included age, sex, birth and death dates, city of death, birth and death places (housing, hospital, street…), and causes of death.
Access to the two databases is not freely available. Overall, for each database, a specific permission is needed. This study has been approved by the French Commission for Data Protection and Liberties (Commission Nationale de l'Informatique et des Libertés: CNIL).
The capture-recapture method
When two sources are considered, the total number of deaths, N, is estimated with numbers of deaths in each source NA and NB and number in common NAB [26] as:
Applying the capture-recapture method requires some general validity conditions:
– Independence of sources (the probability that an observation is in one of the two sources does not depend on the probability of it being in the other source): Since different actors produce the sources, the independence is plausible by construction. In the absence of more than two sources, the qualitative assessment of the dependency between the sources could not be implemented.
– Adequate matching (deaths designated by a source can be matched to those reported by another source without mismatched data): Given the strict rules of anonymity of the CépiDc database, this assumption can hardly be assessed.
– Capture homogeneity (all persons in the population have the same chance of being observed in any source): The homogeneity of the capture was studied by comparing the distributions of homeless deaths from source A and source B, by gender, age, season, place and region of death. When the distribution of the deaths for those variables was significantly different, stratified estimation was undertaken [24].
– Closed population (no movement of subjects within the population): since the study population consisted of dead people, this assumption was fulfilled.
Identification of common cases among sources
Using a capture-recapture method requires matching the records from two sources in order to identify those in common. In the absence of any direct identifier in the CépiDc database, matching could only be performed indirectly with a set of variables common to both databases: age, gender, place of death, date of death.
However, the CMDR database has many missing or inaccurate records that prevent an optimal matching process. As the entire CépiDc database is exhaustive, a preliminary matching of source A and the entire CépiDc database (1.6 million records) was performed. This matching was composed of several steps. As the day of death and age constituted necessary information to avoid duplicates, the matching sample was divided into 4 groups according to the presence or absence of these two variables. For each sample, variables were included into the combination if the completeness rate was higher than 80%. For each death reported in the CMDR database, the algorithm searched for a correspondence in the CépiDc database. If no observation with the same combination into the two files was found, the algorithm allowed partial matches: one of the variables was sequentially removed and a new matching searched one or more observations with new combinations. In the case where several observations of CépiDc were consistent with a single death of the CMDR database, only the most informative combination of matched deaths was kept.
Out of the 1145 deaths of the source A, 391 were not retrieved from the CépiDc database due to missing or inaccurate data and were excluded from subsequent calculations. Thus, the capture recapture method was applied to the source A’(N = 754), excluding the 391 deaths of the 1145 of the source A. Finally, the total number of deaths, N, was estimated from the numbers of deaths in each source NA' and NB and from the number of deaths simultaneously present in both sources NA'B[26]. Here, A' is the set of the 754 deaths from the CMDR database (matched to the entire CépiDc database), B is the set of the 241 homeless deaths from the CépiDc database, and A'B is the set of deaths present in both sources.
The data completeness for each source was calculated as the ratio of the total number of homeless deaths in each source and the total capture-recapture estimated deaths.