Skip to main content
  • Research article
  • Open access
  • Published:

Cancer and the healthy immigrant effect: a statistical analysis of cancer diagnosis using a linked Census-cancer registry administrative database



A large volume of research has been published on both the socio economic and demographic determinants of cancer and on the health of immigrants and minority groups. Yet because of data limitations, little research examines differences in the occurrence of cancer incidence between immigrants and non-immigrants and among immigrants defined by region of birth and time in the host country. In particular it is not known whether a healthy immigrant effect is present for cancer and if so, whether this advantage is lost with additional years of residence in the host country.


This paper uses a large data file from Statistics Canada that links Census information on immigrant status, socioeconomic status including educational attainment, and other person-level information with administrative data on cancer and mortality over a continuous 13 year period of observation. It estimates discrete and continuous time duration models to identify differences in cancer diagnosis by immigrant subgroup after controlling for a variety of potential confounders. Differences in historical smoking behavior are not observable at the individual level in the dataset but are accounted for indirectly using various methods.


Results in general confirm the existence of a healthy immigrant effect for cancer in that, overall, recent immigrants to Canada are significantly less likely than otherwise comparable non-immigrant Canadians to be diagnosed with any cancer and the most common forms of cancer by site. As well, this gap appears to decline with additional years in Canada for immigrant men and women, eventually converging to Canadian-born levels. Differentiating among immigrant subgroups by period of arrival and country of birth reveals significant variation across immigrant subgroups, with immigrant men and women from developing countries typically having a lower likelihood of being diagnosed with cancer than immigrants from the US, UK and continental Europe. As well, controlling for immigrant heterogeneity this way weakens the conclusion that the gap narrows with years in Canada. Immigrant men overall continue to exhibit convergence to Canadian-born levels for diagnosis of any cancer and for prostate cancer, while immigrant women exhibit narrowing over time only for breast cancer. Although smoking behavior is not directly observed, controlling for subgroup-specific lifetime smoking behavior using survey data has only a relatively minor effect on the estimated differences.


The specificity of the results by cancer type, gender, immigrant status and ethnicity provides useful guidance for future research by helping to narrow the possible channels through which social and economic characteristics may be affecting cancer incidence.

Peer Review reports


A large volume of research has considered the role of both demographic factors such as race or ethnicity and contextual factors such as socioeconomic status and place of residence in affecting the incidence of cancer. Depending on the cancer type, only 2–10% of cancers have been estimated to be the result of a mutation in, or the operation of, a particular gene (Lee Davis, Donovan, et al. [19]). This suggests that the overwhelming majority of cancer-causing factors are linked to individual behaviour, environmental context, or interactions between genes and these other contextual factors. Identifying and evaluating systematic differences in cancer incidence by socioeconomic status or geographic region of residence can advance our understanding of cancer and so help public health agencies to design policies for those at greater risk of the disease. Socioeconomic factors such as education and income have long been important correlates of cancer incidence and survival. Socioeconomic status may reflect a range of possible cancer determinants that are often not observed in administrative data sources, including health behaviors such as smoking, diet and activity, timely use of preventative health services, cancer screening, delayed childbirth, and exposure to carcinogens through occupational exposures (Link and Phelan [22]; Aronson et al. [3], [8] Lightfoot and Berriault [21]). Demographic and socioeconomic inequalities in cancer incidence persist even with improved knowledge of risk factors and improvements in early detection and treatment, (Clegg et al. [7]).

Much of this work has been limited in terms of available data in one or more dimensions. Data collected as part of a case control study are typically small in scope, which limits generalizability as well as the types of statistical analyses that can be conducted [4]. Health survey data reports information on personal demographic and socioeconomic characteristics however the number of cancer cases among immigrant or minority groups is relatively small and the occurrence of cancer is self-reported. Data drawn from national or state registries are usually limited to basic demographic information on age, sex and location of residence which means that information on many individual characteristics such as education, immigrant status and personal or household income is not available. As well, by their nature Cancer Registries contain data only on those diagnosed with cancer and not the underlying at-risk population. Underlying population is approximated using population counts by defined geographic area usually drawn from a population census. This necessitates that socioeconomic status must be measured using area-level data on median household income or similarly constructed variables. Unlike its Canadian counterpart, the US SEER databases contain data on race and some state registries report information on country of birth, though sometimes specific place of birth and immigrant status has been imputed from information such as patient’s name and age when a social insurance number is issued ([11, 24]). Cancer incidence and outcomes vary significantly by race and ethnicity in the US literature, and in particular overall cancer incidence rates are lower among Asians and Pacific Islanders than among non-Hispanic whites [44]. Newman [30] finds that while black women have a lower incidence of all-stage breast cancer, they also experience higher mortality rates (see also [27]). Since race/ethnicity is not recorded in Canadian Cancer Registry data, Hislop et al. [14] and Bashash et al. [5] impute ethnicity based on a review of patient names in the British Columbia Cancer Registry. Clegg et al. [6] compares race, Hispanic status and immigrant status between registry information and self- reports using the SEER--National Longitudinal Mortality Study linked database and found that while the degree of agreement was excellent for race and Hispanic status, it was low for immigrant status.

These limitations have encouraged interest in and development of linked individual-level datasets that combine administrative, survey and/or census information. Clegg et al. (2009) use the US SEER-NLMS dataset that links state cancer registry data to the National Longitudinal Mortality Study, a component of the Current Population Surveys linked to records on cause of death. A key advantage of this dataset is that individual data on socioeconomic status are available. Clegg et al. find consistent gradients in incidence rates by socioeconomic status for various forms of cancer. They find that the incidence of lung cancer and colorectal cancer was lower for individuals with higher socioeconomic status while the incidence of prostate cancer for men and breast cancer for women increased with socioeconomic status (see also [18]). For Canada, McDermott et al. [26] estimate standardized incidence rates for immigrants from many countries of birth and for most forms of cancer based on an administrative dataset that links immigrant landing records with cancer registry and mortality administrative data. Overall, immigrants tend to have lower standardized incidence rates for most types of cancer though exceptions were a relatively higher incidence of cervical, nasopharyngeal and liver cancers among East Asian immigrants. The dataset that McDermott et al. [26] uses is limited in that it includes individual data only on immigrants and also does not have individual data on socioeconomic status. (Linked datasets are more widely available in some European countries such as the Swedish Family Cancer Database, used by [12, 13, 23, 28]).

While research on population-level cancer outcomes among immigrants is limited, there is a large body of work on many different dimensions of the health of immigrants, including disease prevalence, health behaviors, and service use such as cancer screening. The general conclusion is that a healthy immigrant effect exists whereby recent immigrants are in better health (fewer chronic conditions, better self-reported health, less obesity fewer negative health behaviors) than otherwise comparable host country non-immigrants (e.g., [1, 2, 17, 25, 33], Singh and Siahpush, [39, 40]). In a systematic review of the literature on the health of immigrants to Canada, De Maio [9] finds that there is overwhelming support for the healthy immigrant effect across a range of health outcomes. Possible reasons for the healthy immigrant effect include the self-selection of immigrants who may be healthier and wealthier, differences in current and previous health behaviors such as smoking, alcohol consumption and diet, as well as differences in access to health services, the takeup of health screening, and/or differences in attitudes to health maintenance that lead to differences in the incidence of diagnosed conditions [25]. However for some measures of health this advantage dissipates over years in the host country and converges to native-born levels within 10–15 years or sooner. One limitation in much of this work is information on health outcomes is self-reported. (See also [31], who reviews data development in Canada relevant to immigrant health research). One example of recent work using linked survey-administrative data in Canada is Ng [32] who uses Canadian census data linked to vital statistics and finds a healthy immigrant effect in mortality that persists even after 20 years or more years in Canada. He also finds significant variation by country of birth and area of residence after migration (see also [43], for an examination of race/ethnicity and health in Canada).

This paper uses a large linked individual level dataset on Canadian residents to investigate differences in the incidence of cancer between immigrants and non-immigrants after controlling for a range of demographic and socioeconomic characteristics. We investigate how cancer incidence varies by country of birth, year of arrival, and time spent in Canada. In particular we are interested in examining whether there is a healthy immigrant effect in terms of the incidence of cancer and whether any gap in cancer incidence narrows with additional years in Canada. In addition, we distinguish between first generation immigrants and Canadian born residents by racial or ethnic groups. We consider the overall incidence of cancer in all sites as well as the incidence of the most common forms of cancer: prostate, colorectal and lung cancer for men, and breast, colorectal, and lung cancer for women. We also examine the robustness of the findings to controlling indirectly for differences in smoking behavior. To our knowledge, at least some parts of each question we examine have not previously been studied in Canada using a large population-based dataset.


The data were prepared by Statistics Canada and are based on a large cohort of approximately 2.5 million individuals aged 25 or older who participated in the long form of the 1991 Census of Canada. The Census contains detailed information as of 1991 on individual education level, immigrant status, country of birth, year of arrival, ethnicity, occupation, and personal and household income from various sources. Census records on individuals in this cohort are linked to 1990–91 tax records of Canadian tax filers, Canadian Cancer Registry (CCR) data from 1984 to 2003 and the Canadian Mortality Database (CMD) from 1991 to 2008. (Statistics Canada released an updated file with cancer data to 2008 but disclosure restrictions precluded us from using the updated dataset for this project.) Overall approximately 80% of Census respondents were able to be matched to the administrative data. Wilkins et al. [45], Peters and Tjepkema [34] and Peters et al. [35] provide further discussion of the census cohort database, including details of how the datasets were linked. For each individual the CCR reports the date of diagnosis as well as the site of tumor, and each individual is linked to his or her census information. Linking to the CMD file identifies the date and cause of death for members of the census cohort who died in Canada after 1991, information necessary for our duration analysis. The Census cohort is also linked to the postal code of residence drawn from T1 tax file data. Thus, postal code of residence as of the date of filing is available each year from 1986 to 2008 as long as the individual filed a tax return in that year. This dataset was accessed through the Statistics Canada Research Data Centre Network following Statistics Canada’s standard application, approval, vetting and disclosure policies for access to data contained within the RDC network.

The focus of our analysis is the subset of the 1991 Census cohort for whom location information from the tax filer data is available and can be linked to Cancer Registry data. Since individuals must have been alive and present in Canada as of the Census day, immigrants and returning residents arriving in Canada after 1991 are not part of the Census cohort. A total of 2.7 million individuals can be linked to at least one tax return, including approximately 500,000 immigrants. There are roughly 247,000 cases of cancer diagnoses in this cohort between 1991 and 2003. Individuals diagnosed with cancer prior to 1991 are excluded.

Each individual in the sample is first diagnosed with cancer at a particular point in time or leaves the sample for other reasons before this takes place. By definition the likelihood of diagnosis at a point in time T conditional on this not taking place in the past is

$$ \underset{\triangle t->0}{ \lim}\left(\frac{ \Pr \Big( t\le T\le t+\triangle t}{\triangle t}\right)/\left(1- \Pr \left( T\le t\right)\right) $$

where Pr(t ≤ T ≤ t + ∆t) is the probability of diagnosis between a point in time t and a point in time in the future t + ∆t. Pr(T ≤ t) is the probability of being diagnosed at the point in time t or earlier and this is a failure rate. Diagnosis at a point in time conditional on this not taking place in the past is related to the dependent variable and the associated failure rates are discussed below.

The sample reports the calendar day of diagnosis and does not report the stage at which cancer is at or when it first began hence wider time spans are examined. In particular the time range that the sample reports information on is divided into 13 calendar years.

(1Jan1991, 31Dec1991] , (1Jan1992, 31Dec1992] ,  …  , (1Jan2003 , 31Dec2003].

Within each year an individual is either diagnosed with cancer for the first time, is not diagnosed with cancer and never has been, or has been diagnosed with cancer in previous years and hence is no longer included in the sample. Many individuals enter the sample on January 1st in 1991 and before this have been at risk. This topic is further discussed below.

After a person enters the sample there is an observation for each year during which he or she is not diagnosed for the first time and remains in the sample. For those who are diagnosed there is also one observation associated with the year in which this takes place. Given the relatively wide time spans discrete duration analysis is applied and a logit model is estimated [16]. In order to examine the robustness of findings to the functional form a Cox proportional hazard model is also estimated. Sueyoshi [41] shows how a logistic model with interval-specific intercepts is consistent with an underlying continuous time duration model in which the within-interval durations follow a log-logistic distribution.

Under a logit model individual i’s log odds of conditional diagnosis in year t is

$$ \begin{array}{l}\mathit{\ln}\left(\frac{ \Pr \left(31{Dec}_{t-1}< T\le 31{Dec}_t\right|\ {T}_i>31{Dec}_{t-1}}{1- \Pr \left(31{Dec}_{t-1}< T\le 31{Dec}_t\right|\ {T}_i>31{Dec}_{t-1}}\right)\\ {}={\beta}_0+ X{1}_{i, t}{\beta}_1+ X{2}_i{\beta}_2+{\beta}_3{urban}_{i, t}+\\ {}{\beta}_4\ { i mmigrant}_i\cdot years\ in\ {Canada}_{i, t}+{\beta}_5{ i mmigrant}_i\cdot years\ in\ {Canda}_{i, t}^2\end{array} $$

For each individual there is an observation for each January 1st to December 31st time span examined. The dependent variable is equal to one if a person is diagnosed and equal to zero otherwise. The term X1 i , t is a row vector of calendar year, birth year, and age indicators for individual i in time period t and β 1 is the associated column vector of parameters. Individuals are assumed to first be at risk when they turn 25 and as mentioned above many individuals enter the sample at an older age level. For this reason and in order control for changes over calendar time, across birth cohorts, and age levels one set of indicator variables are used to account for these three factors [36]. This approach imposes very little parametric structure.

The row vector X2 i includes individual i’s education level in 1991 (less than high school, high school only, some post-secondary, bachelor’s degree, and higher degree), ethnicity if Canadian born (White, Black, South Asian, West Asian/Mideast, other Asian, Latin American, and other) region of birth if an immigrant (English speaking developed countries including the US, UK and Ireland, Israel and South Africa, continental Europe, Western Asia/Mideast, other Asia, and other regions), and period of arrival in Canada if an immigrant (1980–90, 1970–79, ..., 1930–39, pre-1930). We omit Canadian born aboriginal people from the sample given the complexities of Aboriginal health outcomes in Canada and consider it in other work. Table 1 provides some descriptive statistics on sample composition by region of birth for immigrants and ethnicity/race for the Canadian-born. It can be seen that even with a large sample, counts for some Canadian born ethnic groups are relatively small.

Table 1 Sample characteristics–immigrant region of birth and Canadian born ethnicity

With a single cross section of data, arrival year and years in Canada are in fact perfectly correlated since year of arrival = current year – years in Canada. Since our dataset is longitudinal however, these effects can be separately identified since a given arrival cohort can be observed each year over the whole sample period. We prefer to proxy socioeconomic status using education in 1991 rather than income in 1991 since the former is subject to much less variation over time for adults. The other factors examined can vary over an individual’s time at risk. These include an indicator for whether the person resides in large urban region of residence (city of 500,000+ population) during the year, and for immigrants the number of years spent in Canada measured as a continuous variable and its square. Postal codes in the Census cohort dataset must be mapped to Statistics Canada geocodes in order to assign rural/urban status. Differences in cancer diagnosis between rural and urban areas can arise for a variety of reasons such as extent of exposure to pollution and are well-established in the literature (e.g., [15]).

A limitation of the Census cohort is that there is no person-level information on smoking and other health behaviors. Sanmartin et al. [37] estimate models of smoking behavior using survey data and then use the resulting estimates to predict smoking behavior of individuals in the Census cohort where the predictions are based on a set of variables common to both data sets. Identification is primarily through functional form. To increase variation in predicted smoking behavior and the closeness of the match of predicted smoking to actual but unobserved behavior, we first estimate discrete time hazard models of years until becoming a daily smoker using the retrospective information on starting smoking available in the Canadian Community Health Survey (CCHS). Logit estimations for the conditional probability of starting smoking are estimated separately for 232 distinct subpopulations based on decade turned 10 and age group, (29 categories), region of birth including Canada (4 categories), and gender (2 categories). For each subsample the regression estimated includes controls for education level, specific country of birth, province of residence in Canada, language first learned, race/visible minority status, and tobacco price. Next, using these results, the age-specific mean hazard rate is computed for a large number of subgroups based on education and region of residence in Canada as well as region of birth, gender, age and decade turned 10. These estimates are used to predict the number of people per thousand of each particular set of characteristics who have never started to smoke daily, and this prediction is merged with the cancer data for people with the same set of characteristics and included as a covariate.

We estimate these models separately for men and women, and allow for clustering of person-specific errors over time. Since the risk factors associated with particular cancers can vary widely, we also estimate eq. (1) separately for the most common cancer types: for women, we estimate the determinants of breast cancer, colorectal cancer and lung cancer; for men, we estimate the determinants of prostate cancer, colorectal cancer and lung cancer.


We begin by estimating simple Logit models separately for men and women that include age/birth cohort/time period controls, an indicator for foreign born status and a quadratic in years-since-migration (YSM). The immigration estimates are presented in Table 2. For any cancer and for the specific cancers considered and for both men and women there is clear evidence of a significant initial gap in the incidence of being diagnosed with cancer upon arrival in Canada for immigrants relative to the Canadian born, and a significant narrowing of this gap with years in Canada towards Canadian born levels of cancer incidence. As can be seen in the top panel of Table 2, for recently arrived immigrant men the odds of being diagnosed with any cancer are only about half of what they are for non-immigrant men (OR 0.517, p-val .000). After arrival, an additional year in Canada increases the odds of being diagnosed with cancer by 0.02 (OR 1.02, p-val .000) although the quadratic term in YSM reduces the magnitude of the increase as YSM gets large. A survival plot based on these results (available on request) indicates convergence and leveling off at Canadian-born levels of cancer incidence at around 50 years in Canada. Results for prostate, colorectal and lung cancer for men are remarkably consistent. In the lower panel of Table 2, the odds of recently arrived immigrant women being diagnosed with cancer are only about 64% of those for Canadian born women (OR 0.638, p-val .000). Odds increase with additional years in Canada but at a slower rate than for immigrant men (YSM OR 1.010, p-val .000; YSM2 OR 1.000, p-val .164). A survival plot (available on request) indicates convergence to Canadian born levels at around 60 years in Canada. For specific cancers, patterns are similar but the initial difference in odds varies by cancer site: OR 0.726, p-val .000 for breast cancer, OR 0.538, p-val .000 for colorectal cancer and OR 0.289, p-val .000 for lung cancer. Overall this provides clear preliminary evidence of a healthy immigrant effect for cancer overall and by site, and a gradual convergence to Canadian-born levels after many years in Canada. As well, adding in controls for education level, province of residence and urban/rural status does not alter this conclusion. While those other covariates are themselves usually significant, given our focus on immigrants and space limitations we do not discuss those results further in this paper but will make them available on request.

Table 2 Selected Logistic odds ratios of cancer incidence of adults by immigrant status and years in Canada (ysm)

One limitation of treating immigrants as a homogeneous group is that what appears to be convergence to Canadian born levels could instead be a lower intrinsic likelihood of being diagnosed with cancer by more recent immigrant groups. As well, determining how results might vary by immigrant dimensions such as country of birth can yield insights into what might underpin observed differences in the incidence of cancer diagnosis. We estimate Logit models with additional covariates for region of birth and period of arrival as well as additional controls for visible minority status for the Canadian born. (The high degree of correlation between visible minority group and country of birth for immigrants precludes the inclusion of both sets of covariates for immigrants). In Table 3, we present results by gender for cancer in any site while in Tables 4 and 5 we present results for specific cancer sites for men and women respectively. With this set of additional immigrant variables included, the interpretation of the immigrant indicator variable alone is the estimated difference in odds on arrival in Canada for the base group of immigrants who arrived prior to 1930 from the United States, while the estimated odds ratios for period of arrival and region of birth modify this effect. As well, with visible minority indicators included for native-born Canadians, the underlying comparison group is white Canadian born men. In Table 3 it can be seen that after controlling for other factors, cancer diagnosis varies significantly across regions of birth and time in Canada. For men, the odds ratio is significantly less than one while the odds ratio for years in Canada is significantly greater than one. Together these results indicate a significant healthy immigrant effect for men but a narrowing of the advantage with time spent in Canada. The odds ratios for the arrival period cohort effects are all less than one but not significant. Region of birth appears to matter much more, with immigrants from most regions significantly less likely to be diagnosed with cancer in a given year relative to immigrants from the US. For example, the odds of a cancer diagnosis for an immigrant man from South Asia are only half that of an immigrant man from the US (OR 0.494 p-val .000). Even immigrant men from continental Europe have significantly lower odds of being diagnosed with cancer than immigrants from the US (OR 0.905, p-val .001). The only exception is immigrant men from UK/Ireland whose odds ratio is not significantly different from US born men. It should be emphasized that since US men themselves are significantly less likely than Canadian born white men, the magnitude of the differences between most immigrants and Canadian born white men are even greater. Canadian born men who identify as a visible minority show marked variation in the likelihood of being diagnosed with cancer. Canadian born men of Middle Eastern descent have higher odds of being diagnosed with cancer (OR 1.231, p-val .015), while black and Latin American men are not significantly different from white men, and South Asian men and other Asian men are significantly less likely to be diagnosed with cancer (OR 0.530, p-val .024 and OR 0.758, p-val .000 respectively).

Table 3 Selected Logistic odds ratios by demographic and socioeconomic characteristics of adults (any cancer)
Table 4 Logistic odds ratios by demographic and socioeconomic characteristics of adult men
Table 5 Logistic odds ratios by demographic and socioeconomic characteristics of adult women

For women, the base group of immigrants from the US who arrived prior to 1930 are not significantly different from Canadian born white women in their odds of being diagnosed with cancer, but for every arrival cohort after that the odds are significantly lower than for the base category. As well, immigrant women from every region of birth except UK/Ireland have significantly lower odds of cancer diagnosis than immigrants from the US. The YSM terms are also not significant, implying no change in cancer diagnosis for immigrant women with years in Canada. In sum, the results together indicate that for most immigrant women – those arriving after 1930 and those from countries other than the US and UK/Ireland – the odds of being diagnosed with any cancer are significantly lower than for Canadian born white women but there is little evidence of convergence over time to Canadian born levels. For women, the odds ratios for immigrants and years in Canada are not significant although later arrivals have significantly lower odds of being diagnosed with cancer. For Canadian born visible minority women, all minority groups have lower estimated odds of being diagnosed with cancer than white women but the result is only significant for black women.

In Table 4, we report odds ratios for men for three types of cancer: prostate, colorectal and lung. For Canadian born minorities, small sample sizes preclude estimation for some groups, especially for prostate cancer. The results indicate a significantly higher likelihood of prostate cancer for black Canadian born men (OR 1.498, p-val .003), a result consistent with the US literature. For Canadian born men of other Asian ethnicities, the odds of being diagnosed with prostate cancer are lower than they are for white men (OR 0.769, p-val .013). Canadian born men of West Asian/Mideast ethnicity have higher odds of colorectal cancer (OR 1.596, p-val .021) and for lung cancer (OR 1.509, p-val .030) than white men while other Asian men have lower odds of lung cancer (OR 0.507, p-val .000).

For US born immigrants who arrived prior to 1930 (our base category), their odds of prostate cancer are significantly lower than for non-immigrants on arrival (OR 0.397, p-val .000) but increase with years in Canada (YSM OR 1.044, p-val .000; YSMsq OR 1.000, p-val .000). Arrival period cohort effects are mostly less than one but none is significant at the 5% level. There are however very large differences in the odds of being diagnosed with prostate cancer by country of birth. Relative to US born immigrants (who themselves are significantly less likely to be diagnosed with prostate cancer than Canadian born whites), estimates for OR range from a low of 0.506 (p-val .000) for other Asian immigrants and 0.670 (p-val .000) for immigrants from South Asia to a high of 1.588 (p-val .000) for immigrants from the Americas. It should be noted that the net effect for this group is that their odds of being diagnosed with prostate cancer relative to Canadian-born whites is still significantly less than 1 (results on request). For colorectal cancer, the odds ratio for the base group of US born immigrants arriving prior to 1930 is not significantly different from that of Canadian born white men. However, relative to these early arrivals almost all immigrant cohorts arriving after 1930 have significantly lower odds of diagnosis, with the difference being more pronounced for more recent cohorts. The net effect is that for immigrants from the US arriving after 1949, their odds of colorectal cancer diagnosis are significantly lower than those of Canadian born whites (available on request). As with prostate cancer there is marked variation in diagnosis with colorectal cancer by place of birth. Immigrants from the UK/Ireland have higher odds of a colorectal cancer diagnosis than immigrants from the US (OR 1.275, p-val .010), offsetting the lower likelihood experienced by most arrival cohorts. Immigrants from South Asia and Africa have lower odds of colorectal cancer diagnosis than US born immigrants (OR 0.462, p-val .000 and OR 0.655, p-val .021 respectively) while for others the estimated odds ratio is not significantly different from 1.

For lung cancer, there is no significant difference in the odds of a US born male being diagnosed compared to a Canadian-born white male, nor are any of the arrival period effects and years since migration effects significant. However there are again marked differences by country of birth, with the odds ratio for every region except UK/Ireland significantly less than 1. This implies then that for immigrant men born elsewhere than the US or UK/Ireland the odds of being diagnosed with lung cancer are significantly lower than for Canadian born white men.

Table 4 presents results for women for three cancer sites: breast, colorectal and lung. There are almost no significant differences in the odds of being diagnosed with any of these cancers for Canadian-born visible minority women compared to white women, possibly because of relatively small sample sizes for these groups. The exception is that for lung cancer Canadian-born ‘other Asian’ women have a significantly lower odds of cancer diagnosis than Canadian-born white women (OR 0.448, p-val .002).

For the reference group of immigrant women born in the US who arrived prior to 1930, there is no significant difference in the odds of cancer diagnosis for each of the cancers considered. Period of arrival in Canada is a significant determinant of breast cancer diagnosis and lung cancer diagnosis, with many more recent arrival groups significantly less likely to be diagnosed with cancer than the earliest arrivals. Only for breast cancer are there changes in incidence with years in Canada, with the net effect a slow increase over time. For region of birth, the general pattern again is that there is significant variation by birth region but overall the pattern is of lower likelihood of a cancer diagnosis for each cancer site. Immigrant women from the UK and Ireland have essentially the same odds of being diagnosed with any of the three forms of cancer as immigrant women from the US (and so by extension Canadian born women given the insignificant base group estimate). Immigrant women from continental Europe, South Asia, other Asia and the Americas have significantly lower odds of being diagnosed with breast cancer, with estimated odds ratios 0.824 (p-val .000), 0.658 (p-val .000), 0.737 (p-val .000), and 0.716 (p-val .000) respectively. For colorectal cancer, immigrant women from every region except continental Europe and UK/Ireland have significantly lower odds of cancer diagnosis than the base group of immigrant women born in the US. Similarly, immigrant women from every region except the UK/Ireland have lower odds of being diagnosed with lung cancer.

Extension - Smoking

One potentially important set of confounding factors is health behaviors such as smoking that are major risk factors for many types of cancer. It is well known that smoking rates vary inversely with education levels but it is also established in the literature that immigrants on average are significantly less likely to smoke than comparable non-immigrants but with marked variation by country of birth, gender and time spent in the host country [17, 20, 29]. One of the major limitations of the Census-cohort (and indeed almost all administrative datasets) is the lack of information on health behaviors. Without controls for differences in health behavior, it may be that at least some of the effect of immigrant status on the likelihood of being diagnosed with cancer is in reflecting these differences. In order to address this, we rely on survey data from Statistics Canada’s Canadian Community Health Surveys.

Selected results are reported in Table 6 for men and Table 7 for women. The predicted smoking measure itself is strongly significant for cancer overall (OR 0.958 p-val .000 for men, OR 0.945 p-val .000 for women) as well as for prostate cancer (OR 0.946 p-val .004) and lung cancers (OR 0.906 p-val .000 for men, OR 0.854 p-val .000 for women), indicating that the higher the probability of an individual never becoming a daily smoker, the lower is the likelihood of a cancer diagnosis. Of more interest is that including this variable in our duration models does not have a large quantitative effect on the estimates of the immigrant controls. Comparing results with and without smoking controls indicates that while in many cases odds ratios with the smoking controls are marginally closer to 1, they remain significant except for female immigrants from Africa and Western Asia for any cancer and for lung cancer. Overall, our estimates of the healthy immigrant effect for cancer diagnosis are not due to group-level differences in smoking behavior.

Table 6 Logistic odds ratios by demographic and socioeconomic characteristics of adult men, with smoking controls
Table 7 Logistic odds ratios by demographic and socioeconomic characteristics of adult women, with smoking controls

Extension - Continuous Time Models

We estimate continuous time Cox proportional hazard models that allow for age, location, and years since migration for immigrants to be time varying while other controls are treated as time invariant. Results for both education controls and immigrant controls are comparable in terms of the estimated relationship between these variables and the diagnosis of cancer. We also implement an indirect adjustment method to control for unobserved smoking [38] and it makes little difference to the results (these are not reported but are available on request).


This is one of the first papers to examine whether there is a healthy immigrant effect in terms of cancer incidence and if so, whether any advantage is lost over time spent in the host country. We are able to account for differences in personal socioeconomic status as measured by the highest level of educational attainment, and we allow for variation in outcomes by region of birth, period of arrival and time in Canada. We also examine the extent to which differences in cancer outcomes by region of origin persist among the Canadian-born descendants of different ethnic groups. Consistent with the literature for other measures of health (e.g., [2, 9, 25]) using a simple set of immigrant controls we find strong evidence that there is a healthy immigrant effect for immigrant men and women after controlling for other factors. We also find that this advantage narrows with years in Canada, converging after around 50 years in the host country. However, allowing differences by immigrant characteristics, including region of birth, year of arrival in Canada and years spent in Canada reveals marked differences across immigrant groups. For immigrant men, the healthy immigrant effect plus convergence over time is evident for any cancer diagnosis and for prostate cancer diagnosis but not for colorectal or lung cancers. For colorectal cancer, arrival period effects indicate that immigrant men arriving after 1930 also have a significantly lower likelihood of a cancer diagnosis. Across regions of birth however differences are most pronounced. For example, immigrant men from South Asia have a markedly lower likelihood of any cancer diagnosis or of prostate, colorectal or lung cancer than immigrants from the US and UK/Ireland as well as Canadian born white men. Immigrant men from Africa have lower likelihood of any cancer and of colorectal and lung compared to US born whites but not prostate cancer. McDermott et al. [26] also finds broadly similar results in terms of their estimates of age-standardized incidence rates by cancer site. One possible explanation for lower might be that the rapid uptake of PSA prostate cancer screening through the 1990s (Dickinson et al. [10]) was not shared by particular immigrant groups so that the differences in prostate cancer diagnosis reflect differences in screening rather than in actual disease incidence. For immigrant women, the base group of US-born immigrants who arrived prior to 1930 are not significantly different in terms of cancer diagnosis, and it is only for breast cancer where there is some evidence of an increase with years since migration. It may be that increased use of regular cancer screening with additional time in Canada is leading to this effect. For example, Vahebi et al. [42] find mammography screening rates lower among new and recent immigrants to Ontario. However any arrival period effects or country of birth effects that are significant are uniformly less than 1, implying lower likelihood of being diagnosed with cancer for most female immigrant subgroups. For example, immigrant women from South Asia are significantly less likely than US arrivals to be diagnosed with any cancer and with breast, colorectal and lung cancers. This is true of immigrant women from continental Europe except for colorectal cancer and other Asia except for lung cancer. Persistence in lower occurrence of certain cancers among immigrants from mainly developing countries even after many years in Canada provides useful guidance as to continued research on what it is about these individuals that seems protective of cancer. Importantly, the results in most cases are not accounted for by group-level differences in lifetime smoking behavior even though smoking rates for immigrants, even from countries with relatively high smoking rates, are significantly lower than for non-immigrants [17, 29].

For second and earlier generation Canadians who belong to a visible minority, many of the estimated odds ratios are insignificant, implying no difference in the incidence of cancer diagnosis with Canadian born whites. For some groups though there is a lower likelihood of being diagnosed with cancer. Canadian born men of Middle Eastern, south Asian or other Asian descent are less likely to be diagnosed with any cancer and the same is true of Canadian born black women. For lung cancer, other Canadian born Asian men and women have a lower incidence of diagnosis. This suggests that possible barriers to timely cancer screening such as language difficulties or unfamiliarity with the healthcare system experienced by some immigrants do not account for estimated differences as these are not factors for individuals born and raised in Canada. For other groups though the likelihood of cancer diagnosis is higher than for Canadian-born whites, such as lung cancer for Middle Eastern men and Prostate cancer for black men.

A number of limitations to this work should be noted. First, although we have endeavored to control indirectly for group-level lifetime smoking patterns, individual smoking behavior is unknown. Other health behaviors such as diet and alcohol could also be incorporated from survey data although they typically are as of the survey year only. Second, useful information on stage at diagnosis is not available in the dataset for most cancers and for most of the sample period, which would help to identify whether differences are in the timing of diagnosis or actual incidence. Third, even with the large sample size, the number of cases of a specific cancer for a specific subgroup may still be insufficient to yield precise estimates, particularly for ethnic subgroups of Canadian born residents. Fourth, while we have endeavored to broadly match immigrant regions of birth and Canadian born visible minority groups for illustration, direct comparisons across generations should not be made owing to the fact that self-reported visible minority status may be open to more subjective interpretation than one’s country of birth. Nevertheless, differences in the likelihood of the diagnosis of certain cancers between immigrants and non-immigrants and between different immigrant groups based on year of arrival, time in Canada and especially country of birth are instructive in guiding future research into what it is about immigrant status that matters for cancer incidence. More broadly, expanding the immigrant health literature to include consideration of validated health conditions from administrative data removes bias that may arise from differences in how individuals self-report such conditions.


This paper finds that immigrants to Canada have a significantly lower likelihood of being diagnosed with cancer, both overall and for particular sites, than for otherwise comparable non-immigrant Canadians. There are marked differences in cancer diagnosis by immigrant region of birth, with immigrants from developing regions of the world experiencing the lowest rates of cancer diagnosis. This healthy immigrant effect for cancer declines with additional years in Canada for immigrant men and women overall but differentiating immigrant groups by period of arrival and country of birth weakens this conclusion, especially for women. For many immigrant groups and for many of the cancer sites considered the gap remains even for long established immigrants. For Canadian born visible minority groups, after controlling for other covariates there are few differences though for some groups a lower likelihood of being diagnosed with certain cancers is present. The results of this paper will help guide further research on identifying the particular channels through which socio-demographic and economic characteristics drive cancer incidence.



Canadian community health survey


Canadian cancer registry


Canadian mortality database


Surveillance, epidemiology and end results program


  1. Acevedo-Garcia D, Bates LM. Latino health paradoxes: empirical evidence, explanations, future research, and implications. In: Rodriguez H, Saenz R, Menjivar C, editors. Latino/as in the United States: Changing the face of America. New York, NY: Springer; 2007. p. 101e113.

    Google Scholar 

  2. Acevedo-Garcia D, Bates LM, Osypuk T, McArdle N. The effect of immigrant generation and duration on self-rated health among US adults 2003-2007. Soc Sci Med. 2010;71(6):1161–72.

    Article  PubMed  Google Scholar 

  3. Aronson KJ, Siemiatycki J, Dewar R, Gérin M. Occupational risk factors for prostate cancer: results from a case-control study in Montréal, Québec, Canada. Am J Epidemiol. 1996;143(4):363–73.

    Article  CAS  PubMed  Google Scholar 

  4. Andreeva V, Unger J, Pentz M. Breast cancer among immigrants: a systematic review and new research directions. J Immigr Minor Health. 2007;9(4):1557–912.

    Article  Google Scholar 

  5. Bashash M, Hislop T, et al. The prognostic effect of ethnicity for gastric and esophageal cancer: the population-based experience in British Columbia, Canada. BMC Cancer. 2011;11(164):1–8.

    Google Scholar 

  6. Clegg LX, Reichman ME, et al. Quality of race, Hispanic ethnicity, and immigrant status in population-based cancer registry data: implications for health disparity studies. Cancer Causes Control. 2007;18(2):177–87.

    Article  PubMed  Google Scholar 

  7. Clegg LX, Reichman ME, Miller BA, Hankey BF, Singh GK, Lin YD, Goodman MT, Lynch CF, Schwartz SM, Chen VW, Bernstein L, Gomez SL, Graff JJ, Lin CC, Johnson NJ, Edwards BK. Impact of socioeconomic status on cancer incidence and stage at diagnosis: selected findings from the surveillance, epidemiology, and end results: National Longitudinal Mortality Study. Cancer Causes Control. 2009;20(4):417–35.

    Article  PubMed  Google Scholar 

  8. Dailey A, Kasl S, Holford T, Calvocoressi L, Jones B. Neighborhood-Level Socioeconomic Predictors of Nonadherence to Mammography Screening Guidelines. Cancer Epidemiol Biomark Prev. 2007;16:2293–303.

    Article  Google Scholar 

  9. De Maio F. Immigration as pathogenic: a systematic review of the health of immigrants to Canada. Int J Equity Health. 2010;9(27):1–20.

    Google Scholar 

  10. Dickinson J, Shane A, Tonelli M, Connor Gorber S, Joffres M, Singh H, Bell N. Trends in prostate cancer incidence and mortality in Canada during the era of prostate-specific antigen screening. CMAJ Open. 2016;4(1):E73–9.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Gomez S, Clarke C, Shema S, Chang E, Keegan T, Glaser S. Disparities in Breast Cancer Survival Among Asian Women by Ethnicity and Immigrant Status: A Population-Based Study. AJPH. 2010;100(5):861–9.

    Article  Google Scholar 

  12. Hemminki K, Li X, Czene K. Cancer risks in first-generation immigrants to Sweden. Int J Cancer. 2002;99(2):218–28.

    Article  CAS  PubMed  Google Scholar 

  13. Hemminki K, Li X. Cancer risks in second-generation immigrants to Sweden. Int J Cancer. 2002;99(2):229–37.

    Article  CAS  PubMed  Google Scholar 

  14. Hislop GT, Bajdik CD, Regier MD, Barroetavena MC. Ethnic differences in survival for female cancers of the breast, cervix and colorectum in British Columbia. Canada Asian Pac J Cancer Prev. 2007;8(2):209–14.

    PubMed  Google Scholar 

  15. Hausauer A, Keegan T, Chang E, Glaser S, et al. Recent trends in breast cancer incidence in US white women by county-level urban/rural and poverty status. BMC Med. 2009;7:31.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Jenkins S (2005). Survival Analysis. Downloaded 11/7/2013.

  17. Kennedy S, Kidd M, McDonald JT, Biddle N. The Healthy Immigrant Effect: Patterns and Evidence from Four Countries. Journal of Migration and Integration. 2014;16(2):317–32.

    Article  Google Scholar 

  18. Krewski D, Jerrett M, et al. Extended follow-up and spatial analysis of the American Cancer Society study linking particulate air pollution and mortality. Boston: Health Effects Institute; 2009.

    Google Scholar 

  19. Lee Davis D, Donovan M, Herberman R, Gaynor M, Axelrod D, van Larebeke N, Sasco AJ. The need to develop centers for environmental oncology. Biomed Pharmacother. 200761(10):614–22.

  20. Leung LA. Healthy and unhealthy assimilation: country of origin and smoking behavior among immigrants. Health Econ. 23:1411–29.

  21. Lightfoot NE, Berriault CJ. Mortality and cancer incidence in a copper-zinc cohort. Workplace Health Saf. 2012;60(5):223–33.

    PubMed  Google Scholar 

  22. Link BG, Phelan JC. Understanding sociodemographic differences in health—The role of fundamental social causes. American Journal of Public Health. 1996;86(4):471–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Loeb S, Drevin L, Robinson D, Holmberg E, Carlsson S, Lambe M, Stattin P. Risk of localized and advanced prostate cancer among immigrants versus native-born Swedish men: a nation-wide population-based study. Cancer Causes Control. 2013;24(2):383–90.

    Article  PubMed  Google Scholar 

  24. Mateos P. A Review of Name-based Ethnicity Classification Methods and their Potential in Population Studies. Popul. Space Place. 2007;13:243–63.

    Article  Google Scholar 

  25. McDonald JT, Kennedy S. Insights into the 'healthy immigrant effect': health status and health service use of immigrants to Canada. Soc Sci Med. 2004;59(8):1613–27.

    Article  PubMed  Google Scholar 

  26. McDermott S, DesMeules M, Lewis R, et al. Cancer Incidence Among Canadian Immigrants, 1980–1998: Results from a National Cohort Study. J Immigr Minor Health. 2011;13:15–26.

    Article  PubMed  Google Scholar 

  27. Menashe I, Anderson WF, Jatoi I, Rosenberg PS. Underlying causes of the black-white racial disparity in breast cancer mortality: a population based analysis. JNCI. 2009;101:993–1000.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Mousavi SM, Fallah M, Sundquist K, Hemminki K. Age-and time-dependent changes in cancer incidence among immigrants to Sweden: colorectal, lung, breast and prostate cancers. Int J Cancer. 2012;131(2):E122–8.

    Article  CAS  PubMed  Google Scholar 

  29. Newbold KB, Nelligan D. Disaggregating Canadian immigrant smoking behaviour by country of birth. Soc Sci Med. 2012;75(6):997–1006.

    Article  PubMed  Google Scholar 

  30. Newman L. Breast cancer in African-American Women. Oncologist. 2005;10(1):1–14.

    Article  PubMed  Google Scholar 

  31. Ng E. Using Canada’s health data. Health Policy Research Bulletin #17 (special issue on immigrant health), 47–49. Ottawa: Health Canada; 2010.

    Google Scholar 

  32. Ng E. The healthy immigrant effect and mortality rates. Health Rep. 2011;22(4):25–9.

    PubMed  Google Scholar 

  33. Pérez CE. Health status and health behaviour among immigrants. Health Rep. 2002;13(Supplement):1–12.

    Google Scholar 

  34. Peters, P. and M. Tjepkema (2010). “1991–2011 Canadian Census Mortality and Cancer Follow-Up Study”, Proceedings of Statistics Canada Symposium 2010.

    Google Scholar 

  35. Peters, P., M. Tjepkema, R. Wilkins (2013). “Data Resource Profile: 1991 Canadian Census Cohort”. International Journal of Epidemiology 2013; Advance Access published 6/9/2013.

  36. Rosenberg P, Anderson WF. Age-Period-cohort models in cancer surveillance research: Ready for prime time? Cancer Epidemiol Biomark Prev. 2011;20(7):1263–8.

    Article  Google Scholar 

  37. Sanmartin C, Finès P, Khan S, Peters P, et al. Modelling risk factor information for linked census data: The case of smoking. Health Rep. 2013;24(6):9–15.

    PubMed  Google Scholar 

  38. Shin H, Cakmak S, Brion O, Villeneuve P, et al. Indirect Adjustment for multiple missing variables applicable to environmental epidemiology. Environment Research. 2014;134:482–7.

    Article  CAS  Google Scholar 

  39. Singh G, Siahpush M. All-cause and cause-specific mortality of immigrants and native born in the United States. Am J Public Health. 2001;91(3):392–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Singh G, Siahpush M. Ethnic-Immigrant Differentials in Health Behaviors, Morbidity, and Cause-Specific Mortality in the United States: An Analysis of Two National Data Bases. Hum Biol. 2002;74(1):83–109.

    Article  PubMed  Google Scholar 

  41. Sueyoshi GT. A class of binary response models for grouped duration data. J Appl Econ. 1995;10:411–31.

    Article  Google Scholar 

  42. Vahabi M, Lofters A, Kumar M, Glazier RH. Breast cancer screening disparities among urban immigrants: a population-based study in Ontario, Canada. BMC Public Health. 2015;15:679.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Veenstra G. Racialized identity and health in Canada: results from a nationally representative survey. Soc Sci Med. 2009;69(4):538–42.

    Article  PubMed  Google Scholar 

  44. Ward E, Jemal A, Cokkinides V, Singh G, Cardinez C, Ghafoor A, et al. Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J Clin. 2004;54(2):78–93.

    Article  PubMed  Google Scholar 

  45. Wilkins, R., M. Tjepkema, C. Mustard and R. Choinière (2008). “The Canadian census mortality follow-up study, 1991 through 2001.” Health Reports.

    Google Scholar 

Download references


Not applicable.


McDonald and Farnworth are tenured faculty members at the University of New Brunswick. Liu is a full-time employee of the University of New Brunswick paid through grants from the Canada Foundation for Innovation and the Canadian Institutes of Health Research. No funding was attached to this research project.

Availability of data and materials

Because of Statistics Canada access rules that ensure the confidentiality of personal information collected on Canadians under the Statistics Act, the linked Census Cohort as well as the Canadian Community Health Surveys used to generate smoking predictions are publicly available only through one of the Research Data Centres (RDCs) that form the Canadian Research Data Centre Network, a network of 25 secure data facilities on university campuses across Canada. The data used in this paper were accessed through the RDC network following Statistics Canada’s standard application, approval, vetting and disclosure policies for access to data contained within the RDC network.

Authors’ contributions

JM conceived the project, interpreted the results and wrote the paper. MF estimated some of the statistical models and contributed to data setup. ZL estimated some of the statistical models and contributed to data setup. All authors reviewed the paper prior to submission. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Canadian research ethics guidelines do not require a research ethics board review for research that uses only secondary data provided by Statistics Canada through the Research Data Centre network.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to James Ted McDonald.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McDonald, J.T., Farnworth, M. & Liu, Z. Cancer and the healthy immigrant effect: a statistical analysis of cancer diagnosis using a linked Census-cancer registry administrative database. BMC Public Health 17, 296 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: