Cancer and the healthy immigrant effect: a statistical analysis of cancer diagnosis using a linked Census-cancer registry administrative database

Background A large volume of research has been published on both the socio economic and demographic determinants of cancer and on the health of immigrants and minority groups. Yet because of data limitations, little research examines differences in the occurrence of cancer incidence between immigrants and non-immigrants and among immigrants defined by region of birth and time in the host country. In particular it is not known whether a healthy immigrant effect is present for cancer and if so, whether this advantage is lost with additional years of residence in the host country. Methods This paper uses a large data file from Statistics Canada that links Census information on immigrant status, socioeconomic status including educational attainment, and other person-level information with administrative data on cancer and mortality over a continuous 13 year period of observation. It estimates discrete and continuous time duration models to identify differences in cancer diagnosis by immigrant subgroup after controlling for a variety of potential confounders. Differences in historical smoking behavior are not observable at the individual level in the dataset but are accounted for indirectly using various methods. Results Results in general confirm the existence of a healthy immigrant effect for cancer in that, overall, recent immigrants to Canada are significantly less likely than otherwise comparable non-immigrant Canadians to be diagnosed with any cancer and the most common forms of cancer by site. As well, this gap appears to decline with additional years in Canada for immigrant men and women, eventually converging to Canadian-born levels. Differentiating among immigrant subgroups by period of arrival and country of birth reveals significant variation across immigrant subgroups, with immigrant men and women from developing countries typically having a lower likelihood of being diagnosed with cancer than immigrants from the US, UK and continental Europe. As well, controlling for immigrant heterogeneity this way weakens the conclusion that the gap narrows with years in Canada. Immigrant men overall continue to exhibit convergence to Canadian-born levels for diagnosis of any cancer and for prostate cancer, while immigrant women exhibit narrowing over time only for breast cancer. Although smoking behavior is not directly observed, controlling for subgroup-specific lifetime smoking behavior using survey data has only a relatively minor effect on the estimated differences. Conclusions The specificity of the results by cancer type, gender, immigrant status and ethnicity provides useful guidance for future research by helping to narrow the possible channels through which social and economic characteristics may be affecting cancer incidence.


Background
A large volume of research has considered the role of both demographic factors such as race or ethnicity and contextual factors such as socioeconomic status and place of residence in affecting the incidence of cancer. Depending on the cancer type, only 2-10% of cancers have been estimated to be the result of a mutation in, or the operation of, a particular gene (Lee Davis, Donovan, et al. [19]). This suggests that the overwhelming majority of cancer-causing factors are linked to individual behaviour, environmental context, or interactions between genes and these other contextual factors. Identifying and evaluating systematic differences in cancer incidence by socioeconomic status or geographic region of residence can advance our understanding of cancer and so help public health agencies to design policies for those at greater risk of the disease. Socioeconomic factors such as education and income have long been important correlates of cancer incidence and survival. Socioeconomic status may reflect a range of possible cancer determinants that are often not observed in administrative data sources, including health behaviors such as smoking, diet and activity, timely use of preventative health services, cancer screening, delayed childbirth, and exposure to carcinogens through occupational exposures (Link and Phelan [22]; Aronson et al. [3], [8] Lightfoot and Berriault [21]). Demographic and socioeconomic inequalities in cancer incidence persist even with improved knowledge of risk factors and improvements in early detection and treatment, (Clegg et al. [7]).
Much of this work has been limited in terms of available data in one or more dimensions. Data collected as part of a case control study are typically small in scope, which limits generalizability as well as the types of statistical analyses that can be conducted [4]. Health survey data reports information on personal demographic and socioeconomic characteristics however the number of cancer cases among immigrant or minority groups is relatively small and the occurrence of cancer is self-reported. Data drawn from national or state registries are usually limited to basic demographic information on age, sex and location of residence which means that information on many individual characteristics such as education, immigrant status and personal or household income is not available. As well, by their nature Cancer Registries contain data only on those diagnosed with cancer and not the underlying at-risk population. Underlying population is approximated using population counts by defined geographic area usually drawn from a population census. This necessitates that socioeconomic status must be measured using area-level data on median household income or similarly constructed variables. Unlike its Canadian counterpart, the US SEER databases contain data on race and some state registries report information on country of birth, though sometimes specific place of birth and immigrant status has been imputed from information such as patient's name and age when a social insurance number is issued ( [11,24]). Cancer incidence and outcomes vary significantly by race and ethnicity in the US literature, and in particular overall cancer incidence rates are lower among Asians and Pacific Islanders than among non-Hispanic whites [44]. Newman [30] finds that while black women have a lower incidence of all-stage breast cancer, they also experience higher mortality rates (see also [27]). Since race/ethnicity is not recorded in Canadian Cancer Registry data, Hislop et al. [14] and Bashash et al. [5] impute ethnicity based on a review of patient names in the British Columbia Cancer Registry. Clegg et al. [6] compares race, Hispanic status and immigrant status between registry information and self-reports using the SEER-National Longitudinal Mortality Study linked database and found that while the degree of agreement was excellent for race and Hispanic status, it was low for immigrant status.
These limitations have encouraged interest in and development of linked individual-level datasets that combine administrative, survey and/or census information. Clegg et al. (2009) use the US SEER-NLMS dataset that links state cancer registry data to the National Longitudinal Mortality Study, a component of the Current Population Surveys linked to records on cause of death. A key advantage of this dataset is that individual data on socioeconomic status are available. Clegg et al. find consistent gradients in incidence rates by socioeconomic status for various forms of cancer. They find that the incidence of lung cancer and colorectal cancer was lower for individuals with higher socioeconomic status while the incidence of prostate cancer for men and breast cancer for women increased with socioeconomic status (see also [18]). For Canada, McDermott et al. [26] estimate standardized incidence rates for immigrants from many countries of birth and for most forms of cancer based on an administrative dataset that links immigrant landing records with cancer registry and mortality administrative data. Overall, immigrants tend to have lower standardized incidence rates for most types of cancer though exceptions were a relatively higher incidence of cervical, nasopharyngeal and liver cancers among East Asian immigrants. The dataset that McDermott et al. [26] uses is limited in that it includes individual data only on immigrants and also does not have individual data on socioeconomic status. (Linked datasets are more widely available in some European countries such as the Swedish Family Cancer Database, used by [12,13,23,28]). While research on population-level cancer outcomes among immigrants is limited, there is a large body of work on many different dimensions of the health of immigrants, including disease prevalence, health behaviors, and service use such as cancer screening. The general conclusion is that a healthy immigrant effect exists whereby recent immigrants are in better health (fewer chronic conditions, better self-reported health, less obesity fewer negative health behaviors) than otherwise comparable host country non-immigrants (e.g., [1,2,17,25,33], Singh and Siahpush, [39,40]). In a systematic review of the literature on the health of immigrants to Canada, De Maio [9] finds that there is overwhelming support for the healthy immigrant effect across a range of health outcomes. Possible reasons for the healthy immigrant effect include the self-selection of immigrants who may be healthier and wealthier, differences in current and previous health behaviors such as smoking, alcohol consumption and diet, as well as differences in access to health services, the takeup of health screening, and/or differences in attitudes to health maintenance that lead to differences in the incidence of diagnosed conditions [25]. However for some measures of health this advantage dissipates over years in the host country and converges to native-born levels within 10-15 years or sooner. One limitation in much of this work is information on health outcomes is self-reported. (See also [31], who reviews data development in Canada relevant to immigrant health research). One example of recent work using linked survey-administrative data in Canada is Ng [32] who uses Canadian census data linked to vital statistics and finds a healthy immigrant effect in mortality that persists even after 20 years or more years in Canada. He also finds significant variation by country of birth and area of residence after migration (see also [43], for an examination of race/ethnicity and health in Canada). This paper uses a large linked individual level dataset on Canadian residents to investigate differences in the incidence of cancer between immigrants and nonimmigrants after controlling for a range of demographic and socioeconomic characteristics. We investigate how cancer incidence varies by country of birth, year of arrival, and time spent in Canada. In particular we are interested in examining whether there is a healthy immigrant effect in terms of the incidence of cancer and whether any gap in cancer incidence narrows with additional years in Canada. In addition, we distinguish between first generation immigrants and Canadian born residents by racial or ethnic groups. We consider the overall incidence of cancer in all sites as well as the incidence of the most common forms of cancer: prostate, colorectal and lung cancer for men, and breast, colorectal, and lung cancer for women. We also examine the robustness of the findings to controlling indirectly for differences in smoking behavior. To our knowledge, at least some parts of each question we examine have not previously been studied in Canada using a large population-based dataset.

Methods
The data were prepared by Statistics Canada and are based on a large cohort of approximately 2.5 million individuals aged 25 or older who participated in the long form of the 1991 Census of Canada. The Census contains detailed information as of 1991 on individual education level, immigrant status, country of birth, year of arrival, ethnicity, occupation, and personal and household income from various sources. Census records on individuals in this cohort are linked to 1990-91 tax records of Canadian tax filers, Canadian Cancer Registry (CCR) data from 1984 to 2003 and the Canadian Mortality Database (CMD) from 1991 to 2008. (Statistics Canada released an updated file with cancer data to 2008 but disclosure restrictions precluded us from using the updated dataset for this project.) Overall approximately 80% of Census respondents were able to be matched to the administrative data. Wilkins et al. [45], Peters and Tjepkema [34] and Peters et al. [35] provide further discussion of the census cohort database, including details of how the datasets were linked. For each individual the CCR reports the date of diagnosis as well as the site of tumor, and each individual is linked to his or her census information. Linking to the CMD file identifies the date and cause of death for members of the census cohort who died in Canada after 1991, information necessary for our duration analysis. The Census cohort is also linked to the postal code of residence drawn from T1 tax file data. Thus, postal code of residence as of the date of filing is available each year from 1986 to 2008 as long as the individual filed a tax return in that year. This dataset was accessed through the Statistics Canada Research Data Centre Network following Statistics Canada's standard application, approval, vetting and disclosure policies for access to data contained within the RDC network.
The focus of our analysis is the subset of the 1991 Census cohort for whom location information from the tax filer data is available and can be linked to Cancer Registry data. Since individuals must have been alive and present in Canada as of the Census day, immigrants and returning residents arriving in Canada after 1991 are not part of the Census cohort. A total of 2.7 million individuals can be linked to at least one tax return, including approximately 500,000 immigrants. There are roughly 247,000 cases of cancer diagnoses in this cohort between 1991 and 2003. Individuals diagnosed with cancer prior to 1991 are excluded.
Each individual in the sample is first diagnosed with cancer at a particular point in time or leaves the sample for other reasons before this takes place. By definition the likelihood of diagnosis at a point in time T conditional on this not taking place in the past is where Pr(t ≤ T ≤ t + Δt) is the probability of diagnosis between a point in time t and a point in time in the future t + Δt. Pr(T ≤ t) is the probability of being diagnosed at the point in time t or earlier and this is a failure rate. Diagnosis at a point in time conditional on this not taking place in the past is related to the dependent variable and the associated failure rates are discussed below. The sample reports the calendar day of diagnosis and does not report the stage at which cancer is at or when it first began hence wider time spans are examined. In particular the time range that the sample reports information on is divided into 13 calendar years.
Within each year an individual is either diagnosed with cancer for the first time, is not diagnosed with cancer and never has been, or has been diagnosed with cancer in previous years and hence is no longer included in the sample. Many individuals enter the sample on January 1st in 1991 and before this have been at risk. This topic is further discussed below.
After a person enters the sample there is an observation for each year during which he or she is not diagnosed for the first time and remains in the sample. For those who are diagnosed there is also one observation associated with the year in which this takes place. Given the relatively wide time spans discrete duration analysis is applied and a logit model is estimated [16]. In order to examine the robustness of findings to the functional form a Cox proportional hazard model is also estimated. Sueyoshi [41] shows how a logistic model with intervalspecific intercepts is consistent with an underlying continuous time duration model in which the withininterval durations follow a log-logistic distribution. Under a logit model individual i's log odds of conditional diagnosis in year t is ln For each individual there is an observation for each January 1st to December 31st time span examined. The dependent variable is equal to one if a person is diagnosed and equal to zero otherwise. The term X1 i , t is a row vector of calendar year, birth year, and age indicators for individual i in time period t and β 1 is the associated column vector of parameters. Individuals are assumed to first be at risk when they turn 25 and as mentioned above many individuals enter the sample at an older age level. For this reason and in order control for changes over calendar time, across birth cohorts, and age levels one set of indicator variables are used to account for these three factors [36]. This approach imposes very little parametric structure.
The row vector X2 i includes individual i's education level in 1991 (less than high school, high school only, some post-secondary, bachelor's degree, and higher degree), ethnicity if Canadian born (White, Black, South Asian, West Asian/Mideast, other Asian, Latin American, and other) region of birth if an immigrant (English speaking developed countries including the US, UK and Ireland, Israel and South Africa, continental Europe, Western Asia/Mideast, other Asia, and other regions), and period of arrival in Canada if an immigrant (1980-90, 1970-79, ..., 1930-39, pre-1930). We omit Canadian born aboriginal people from the sample given the complexities of Aboriginal health outcomes in Canada and consider it in other work. Table 1 provides some descriptive statistics on sample composition by region of birth for immigrants and ethnicity/race for the Canadian-born. It can be seen that even with a large sample, counts for some Canadian born ethnic groups are relatively small. With a single cross section of data, arrival year and years in Canada are in fact perfectly correlated since year of arrival = current yearyears in Canada. Since our dataset is longitudinal however, these effects can be separately identified since a given arrival cohort can be observed each year over the whole sample period. We prefer to proxy socioeconomic status using education in 1991 rather than income in 1991 since the former is subject to much less variation over time for adults. The other factors examined can vary over an individual's time at risk. These include an indicator for whether the person resides in large urban region of residence (city of 500,000+ population) during the year, and for immigrants the number of years spent in Canada measured as a continuous variable and its square. Postal codes in the Census cohort dataset must be mapped to Statistics Canada geocodes in order to assign rural/urban status. Differences in cancer diagnosis between rural and urban areas can arise for a variety of reasons such as extent of exposure to pollution and are well-established in the literature (e.g., [15]).
A limitation of the Census cohort is that there is no person-level information on smoking and other health behaviors. Sanmartin et al. [37] estimate models of smoking behavior using survey data and then use the resulting estimates to predict smoking behavior of individuals in the Census cohort where the predictions are based on a set of variables common to both data sets. Identification is primarily through functional form. To increase variation in predicted smoking behavior and the closeness of the match of predicted smoking to actual but unobserved behavior, we first estimate discrete time hazard models of years until becoming a daily smoker using the retrospective information on starting smoking available in the Canadian Community Health Survey (CCHS). Logit estimations for the conditional probability of starting smoking are estimated separately for 232 distinct subpopulations based on decade turned 10 and age group, (29 categories), region of birth including Canada (4 categories), and gender (2 categories). For each subsample the regression estimated includes controls for education level, specific country of birth, province of residence in Canada, language first learned, race/visible minority status, and tobacco price. Next, using these results, the age-specific mean hazard rate is computed for a large number of subgroups based on education and region of residence in Canada as well as region of birth, gender, age and decade turned 10. These estimates are used to predict the number of people per thousand of each particular set of characteristics who have never started to smoke daily, and this prediction is merged with the cancer data for people with the same set of characteristics and included as a covariate.
We estimate these models separately for men and women, and allow for clustering of person-specific errors over time. Since the risk factors associated with particular cancers can vary widely, we also estimate eq. (1) separately for the most common cancer types: for women, we estimate the determinants of breast cancer, colorectal cancer and lung cancer; for men, we estimate the determinants of prostate cancer, colorectal cancer and lung cancer.

Results
We begin by estimating simple Logit models separately for men and women that include age/birth cohort/time period controls, an indicator for foreign born status and a quadratic in years-since-migration (YSM). The immigration estimates are presented in Table 2. For any cancer and for the specific cancers considered and for both men and women there is clear evidence of a significant initial gap in the incidence of being diagnosed with cancer upon arrival in Canada for immigrants relative to the Canadian born, and a significant narrowing of this gap with years in Canada towards Canadian born levels of cancer incidence. As can be seen in the top panel of Table 2, for recently arrived immigrant men the odds of being diagnosed with any cancer are only about half of what they are for non-immigrant men (OR 0.517, p-val .000). After arrival, an additional year in Canada increases the odds of being diagnosed with cancer by 0.02 (OR 1.02, p-val .000) although the quadratic term in YSM reduces the magnitude of the increase as YSM gets large. A survival plot based on these results (available on request) indicates convergence and leveling off at Canadian-born levels of cancer incidence at around 50 Overall this provides clear preliminary evidence of a healthy immigrant effect for cancer overall and by site, and a gradual convergence to Canadian-born levels after many years in Canada. As well, adding in controls for education level, province of residence and urban/rural status does not alter this conclusion. While those other covariates are themselves usually significant, given our focus on immigrants and space limitations we do not discuss those results further in this paper but will make them available on request.
One limitation of treating immigrants as a homogeneous group is that what appears to be convergence to Canadian born levels could instead be a lower intrinsic likelihood of being diagnosed with cancer by more recent immigrant groups. As well, determining how results might vary by immigrant dimensions such as country of birth can yield insights into what might underpin observed differences in the incidence of cancer diagnosis. We estimate Logit models with additional covariates for region of birth and period of arrival as well as additional controls for visible minority status for the Canadian born. (The high degree of correlation between visible minority group and country of birth for immigrants precludes the inclusion of both sets of covariates for immigrants). In Table 3, we present results by gender for cancer in any site while in Tables 4 and 5 we present results for specific cancer sites for men and women respectively. With this set of additional immigrant variables included, the interpretation of the immigrant indicator variable alone is the estimated difference in odds on arrival in Canada for the base group of immigrants who arrived prior to 1930 from the United States, while the estimated odds ratios for period of arrival and region of birth modify this effect. As well, with visible minority indicators included for native-born Canadians, the underlying comparison group is white Canadian born men. In Table 3 it can be seen that after controlling for other factors, cancer diagnosis varies significantly across regions of birth and time in Canada. For men, the odds ratio is significantly less than one while the odds ratio for years in Canada is significantly greater than one. Together these results indicate a significant healthy immigrant effect for men but a narrowing of the advantage with time spent in Canada. The odds ratios for the arrival period cohort effects are all less than one but not significant. Region of birth appears to matter much more, with immigrants from most regions significantly less likely to be diagnosed with cancer in a given year relative to immigrants from the US. For example, the odds of a cancer diagnosis for an immigrant man from South Asia are only half that of an immigrant man from For women, the base group of immigrants from the US who arrived prior to 1930 are not significantly different from Canadian born white women in their odds of being diagnosed with cancer, but for every arrival cohort after that the odds are significantly lower than for the base category. As well, immigrant women from every region of birth except UK/Ireland have significantly lower odds of cancer diagnosis than immigrants from the US. The YSM terms are also not significant, implying no   Table 3 For US born immigrants who arrived prior to 1930 (our base category), their odds of prostate cancer are significantly lower than for non-immigrants on arrival (OR 0.397, p-val .000) but increase with years in Canada (YSM OR 1.044, p-val .000; YSMsq OR 1.000, p-val .000). Arrival period cohort effects are mostly less than one but none is significant at the 5% level. There are however very large differences in the odds of being diagnosed with prostate cancer by country of birth. Relative to US born immigrants (who themselves are significantly less likely to be diagnosed with prostate cancer than Canadian born whites), estimates for OR range from a low of 0.506 (p-val .000) for other Asian immigrants and 0.670 (p-val .000) for immigrants from South Asia to a high of 1.588 (p-val .000) for immigrants from the Americas. It should be noted that the net effect for this group is that their odds of being diagnosed with prostate cancer relative to Canadian-born whites is still significantly less than 1 (results on request). For colorectal cancer, the odds ratio for the base group of US born immigrants arriving prior to 1930 is not significantly different from that of Canadian born white men. However, relative to these early arrivals almost all immigrant cohorts arriving after 1930 have significantly lower odds of diagnosis, with the difference being more pronounced for more recent cohorts. The net effect is that for immigrants from the US arriving after 1949, their odds of colorectal cancer diagnosis are significantly lower than those of Canadian born whites (available on request). As with prostate cancer there is marked variation in diagnosis with colorectal cancer by place of birth. Immigrants from the UK/Ireland have higher odds of a colorectal cancer diagnosis than immigrants from the US (OR 1.275, p-val .010), offsetting the lower likelihood experienced by most arrival cohorts. Immigrants from South Asia and Africa have lower odds of colorectal cancer diagnosis than US born immigrants (OR 0.462, p-val .000 and OR 0.655, p-val .021 respectively) while for others the estimated odds ratio is not significantly different from 1.
For lung cancer, there is no significant difference in the odds of a US born male being diagnosed compared to a Canadian-born white male, nor are any of the arrival period effects and years since migration effects significant. However there are again marked differences by country of birth, with the odds ratio for every region except UK/ Ireland significantly less than 1. This implies then that for immigrant men born elsewhere than the US or UK/ Ireland the odds of being diagnosed with lung cancer are significantly lower than for Canadian born white men. Table 4 presents results for women for three cancer sites: breast, colorectal and lung. There are almost no significant differences in the odds of being diagnosed with any of these cancers for Canadian-born visible minority women compared to white women, possibly because of relatively small sample sizes for these groups. The exception is that for lung cancer Canadian-born 'other Asian' women have a significantly lower odds of cancer diagnosis than Canadian-born white women (OR 0.448, p-val .002).
For the reference group of immigrant women born in the US who arrived prior to 1930, there is no significant difference in the odds of cancer diagnosis for each of the cancers considered. Period of arrival in Canada is a significant determinant of breast cancer diagnosis and lung cancer diagnosis, with many more recent arrival groups significantly less likely to be diagnosed with cancer than the earliest arrivals. Only for breast cancer are there changes in incidence with years in Canada, with the net effect a slow increase over time. For region of birth, the general pattern again is that there is significant variation by birth region but overall the pattern is of lower likelihood of a cancer diagnosis for each cancer site. Immigrant women from the UK and Ireland have essentially the same odds of being diagnosed with any of the three forms of cancer as immigrant women from the US (and so by extension Canadian born women given the insignificant base group estimate). Immigrant women from continental Europe, South Asia, other Asia and the Americas have significantly lower odds of being diagnosed with breast cancer, with estimated odds ratios 0.824 (p-val .000), 0.658 (p-val .000), 0.737 (p-val .000), and 0.716 (p-val .000) respectively. For colorectal cancer, immigrant women from every region except continental Europe and UK/Ireland have significantly lower odds of cancer diagnosis than the base group of immigrant women born in the US. Similarly, immigrant women from every region except the UK/Ireland have lower odds of being diagnosed with lung cancer.

Extension -Smoking
One potentially important set of confounding factors is health behaviors such as smoking that are major risk factors for many types of cancer. It is well known that smoking rates vary inversely with education levels but it is also established in the literature that immigrants on average are significantly less likely to smoke than comparable non-immigrants but with marked variation by country of birth, gender and time spent in the host country [17,20,29]. One of the major limitations of the Census-cohort (and indeed almost all administrative datasets) is the lack of information on health behaviors. Without controls for differences in health behavior, it may be that at least some of the effect of immigrant status on the likelihood of being diagnosed with cancer is in reflecting these differences. In order to address this, we rely on survey data from Statistics Canada's Canadian Community Health Surveys.
Selected results are reported in Table 6 for men and  Table 7 for women. The predicted smoking measure itself is strongly significant for cancer overall (OR 0.958 p-val .000 for men, OR 0.945 p-val .000 for women) as well as for prostate cancer (OR 0.946 p-val .004) and lung cancers (OR 0.906 p-val .000 for men, OR 0.854 p-val .000 for women), indicating that the higher the probability of an individual never becoming a daily smoker, the lower is the likelihood of a cancer diagnosis. Of more interest is that including this variable in our duration models does not have a large quantitative effect on the estimates of the immigrant controls. Comparing results with and without smoking controls indicates that while in many cases odds ratios with the smoking controls are marginally closer to 1, they remain significant except for female immigrants from Africa and Western Asia for any cancer and for lung cancer. Overall, our estimates of the healthy immigrant effect for cancer diagnosis are not due to group-level differences in smoking behavior.

Extension -Continuous Time Models
We estimate continuous time Cox proportional hazard models that allow for age, location, and years since migration for immigrants to be time varying while other controls are treated as time invariant. Results for both education controls and immigrant controls are comparable in terms of the estimated relationship between these variables and the diagnosis of cancer. We also implement an indirect adjustment method to control for unobserved smoking [38] and it makes little difference to the results (these are not reported but are available on request).

Discussion
This is one of the first papers to examine whether there is a healthy immigrant effect in terms of cancer incidence and if so, whether any advantage is lost over time spent in the host country. We are able to account for differences in personal socioeconomic status as measured by the highest level of educational attainment, and we allow for variation in outcomes by region of birth, period of arrival and time in Canada. We also examine the extent to which differences in cancer outcomes by region of origin persist among the Canadian-born descendants of different ethnic groups. Consistent with the literature for other measures of health (e.g., [2,9,25]) using a simple set of immigrant controls we find strong evidence that there is a healthy immigrant effect for   Table 3. Regressions also include controls for ethnicity of Canadian-born among immigrants from mainly developing countries even after many years in Canada provides useful guidance as to continued research on what it is about these individuals that seems protective of cancer. Importantly, the results in most cases are not accounted for by group-level differences in lifetime smoking behavior even though smoking rates for immigrants, even from countries with relatively high smoking rates, are significantly lower than for non-immigrants [17,29]. For second and earlier generation Canadians who belong to a visible minority, many of the estimated odds ratios are insignificant, implying no difference in the incidence of cancer diagnosis with Canadian born whites. For some groups though there is a lower likelihood of being diagnosed with cancer. Canadian born men of Middle Eastern, south Asian or other Asian descent are less likely to be diagnosed with any cancer and the same is true of Canadian born black women. For lung cancer, other Canadian born Asian men and women have a lower incidence of diagnosis. This suggests that possible barriers to timely cancer screening such as language difficulties or unfamiliarity with the healthcare system experienced by some immigrants do not account for estimated differences as these are not factors for individuals born and raised in Canada. For other groups though the likelihood of cancer diagnosis is higher than for Canadian-born whites, such as lung cancer for Middle Eastern men and Prostate cancer for black men.
A number of limitations to this work should be noted. First, although we have endeavored to control indirectly for group-level lifetime smoking patterns, individual smoking behavior is unknown. Other health behaviors such as diet and alcohol could also be incorporated from survey data although they typically are as of the survey year only. Second, useful information on stage at diagnosis is not available in the dataset for most cancers and for most of the sample period, which would help to identify whether differences are in the timing of diagnosis or actual incidence. Third, even with the large sample size, the number of cases of a specific cancer for a specific subgroup may still be insufficient to yield precise estimates, particularly for ethnic subgroups of Canadian born residents. Fourth, while we have endeavored to broadly match immigrant regions of birth and Canadian born visible minority groups for illustration, direct comparisons across generations should not be made owing to the fact that self-reported visible minority status may be open to more subjective interpretation than one's country of birth. Nevertheless, differences in the likelihood of the diagnosis of certain cancers between immigrants and non-immigrants and between different immigrant groups based on year of arrival, time in Canada and especially country of birth are instructive in guiding future research into what it is about immigrant status that matters for cancer incidence. More broadly, expanding the immigrant health literature to include consideration of validated health conditions from administrative data removes bias that may arise from differences in how individuals self-report such conditions.

Conclusions
This paper finds that immigrants to Canada have a significantly lower likelihood of being diagnosed with cancer, both overall and for particular sites, than for otherwise comparable non-immigrant Canadians. There are marked differences in cancer diagnosis by immigrant region of birth, with immigrants from developing regions of the world experiencing the lowest rates of cancer diagnosis. This healthy immigrant effect for cancer declines with additional years in Canada for immigrant men and women overall but differentiating immigrant groups by period of arrival and country of birth weakens this conclusion, especially for women. For many immigrant groups and for many of the cancer sites considered the gap remains even for long established immigrants. For Canadian born visible minority groups, after controlling for other covariates there are few differences though for some groups a lower likelihood of being diagnosed with certain cancers is present. The results of this paper will help guide further research on identifying the particular channels through which socio-demographic and economic characteristics drive cancer incidence.
Abbreviations CCHS: Canadian community health survey; CCR: Canadian cancer registry; CMD: Canadian mortality database; SEER: Surveillance, epidemiology and end results program