California Men's Health Study (CMHS): a multiethnic cohort in a managed care setting

Background We established a male, multiethnic cohort primarily to study prostate cancer etiology and secondarily to study the etiologies of other cancer and non-cancer conditions. Methods/Design Eligible participants were 45-to-69 year old males who were members of a large, prepaid health plan in California. Participants completed two surveys on-line or on paper in 2002 – 2003. Survey content included demographics; family, medical, and cancer screening history; sexuality and sexual development; lifestyle (diet, physical activity, and smoking); prescription and non-prescription drugs; and herbal supplements. We linked study data with clinical data, including laboratory, hospitalization, and cancer data, from electronic health plan files. We recruited 84,170 participants, approximately 40% from minority populations and over 5,000 who identified themselves as other than heterosexual. We observed a wide range of education (53% completed less than college) and income. PSA testing rates (75% overall) were highest among black participants. Body mass index (BMI) (median 27.2) was highest for blacks and Latinos and lowest for Asians, and showed 80.6% agreement with BMI from clinical data sources. The sensitivity and specificity can be assessed by comparing self-reported data, such as PSA testing, diabetes, and history of cancer, to health plan data. We anticipate that nearly 1,500 prostate cancer diagnoses will occur within five years of cohort inception. Discussion A wide variety of epidemiologic, health services, and outcomes research utilizing a rich array of electronic, biological, and clinical resources is possible within this multiethnic cohort. The California Men's Health Study and other cohorts nested within comprehensive health delivery systems can make important contributions in the area of men's health.


Background
In 2002 -2003, we established the California Men's Health Study (CMHS), a multiethnic cohort of 84 170 men ages 45 to 69 years who are members of the northern and southern California regions of Kaiser Permanente, the largest managed care organization in California. The major goal of the study is to improve understanding of the development of prostate cancer due to the huge impact it exerts on morbidity and mortality of American males, especially African-American males [1]. By establishing the cohort within an integrated managed care organization, we have access to rich electronic data sources to supplement clinical data collected by survey and to permit the identification and investigation of other health outcomes including other cancers and other non-cancerous conditions.
We obtained baseline data from the men using a two-stage mailed survey, and we employed unique recruitment techniques to enrich the cohort with non-white men. With this cohort, we plan to build on previous work and investigate novel hypotheses in the broad area of men's health. This paper is to describe the baseline demographic characteristics, health status, and lifestyle behaviors of the cohort, to estimate expected cases of cancer, to evaluate the validity of the self-reported data, and to examine how representative the cohort was to the larger population.

Participant eligibility criteria and identification
Eligible participants included all male California Kaiser Permanente (KP) Health Plan members, aged 45 to 69 years in January 2000, who had been members of the health plan for at least one year at recruitment. We identified potential participants from electronic KP membership files containing each member's birthdate, current address, and membership history. At recruitment, nearly 850,000 health plan members met the eligibility criteria.
The study protocol was reviewed and approved for the protection of human subjects by the Institutional Review Boards of Kaiser Permanente Northern California and Kaiser Permanente Southern California.

Participant recruitment and data collection
We recruited participants beginning in January 2002 and concluding at the end of December 2003 using a two-step process in three mailing waves. In the first step of each mailing wave, we mailed a recruitment letter and a short questionnaire in English/Cantonese to potentially eligible health plan members with Chinese surnames [2,3] or in English/Spanish to all other potentially eligible members. The short questionnaire requested information on the men's race/ethnicity, information not routinely collected by the health plan. In the second step of each mailing wave, we mailed a cover letter and a longer, comprehensive questionnaire to participants who had completed and returned the short questionnaire. The cover letter and questionnaire were in the language in which the participant completed the short questionnaire, and the cover letters accompanying the questionnaires were tailored to the participant's specific race/ethnic group and were signed by a well known athletic (black/African-American) or political (Latino) figure or by one of the study investigators (all others). Members were also offered the option to complete the questionnaires electronically in English on a secure website (NCS Pearson, Inc.).
Recruitment mailings were done in three waves using mailing strategies to enhance recruitment of minority male members. In mailing wave one, we mailed to members residing near KP medical centers serving the highest estimated proportion of African-American men and to members we identified with Chinese [2,3] Spanish surnames [4]. In mailing wave two, we mailed to men either who were listed as minority based on hospitalization information or who resided near additional KP facilities with high minority populations. In mailing wave three, due to limited study funds, remaining potentially eligible members were mailed a recruitment letter inviting them to complete both questionnaires on the website, and they had the option of requesting a paper questionnaire. Mailing waves included 200,000 to 350,000 members.

Questionnaires
The short (2-page) questionnaire included questions on race/ethnicity, anthropometrics, prostate specific antigen (PSA) testing history and personal history of prostate cancer or benign prostatic hyperplasia (BPH).
The longer (24-page) questionnaire solicited information on demographics, family history of cancer, health and lifestyle, prostate-related symptoms and conditions, other existing health conditions, medication/drug use, physical activity, tobacco use, diet/supplement use, country of origin, duration of U.S. residency, income, and sexual development, orientation, and health. Lower urinary tract symptoms were assessed using the American Urology Association Symptom Index (AUASI) [5], while erectile dysfunction was assessed by a four-level single question adapted form the Massachusetts Male Aging Study [6]. We assessed diet with a detailed semi-quantitative food-frequency questionnaire adapted from a questionnaire developed for the Women's Health Initiative and other studies [7][8][9], modified to men's studies of prostate health [10]. We assessed physical activity with questions adapted from the CARDIA Physical Activity History [11][12][13] that queried the men about the frequency and duration of their participation in specific types of moderate and vigorous recreational, household, and work-related activities.
The CARDIA questionnaire has indirect validity against aerobic capacity and percent body fat [11,12] and a strong inverse relation with most cardiovascular disease risk factors [14,15].
We manually reviewed paper questionnaires for stray marks, completeness, and multiple responses. We optically scanned the questionnaires into study computers using ScanTools ® II Software (Pearson NCS, Inc, Bloomington, MN). We uploaded the data into SAS data management databases (SAS Statistical Software Version 8.2, SAS Statistical Institute, Cary, NC) for cleaning and storage. We used computerized edits to assess the data for logic errors, out-of-range values, missing data, and multiple responses.

Analytical/statistical methods
We constructed the variables reported in the tables from both self-reported and health plan data. We linked study databases with electronic health plan databases using the member's health record number, a unique subject identifier. The health plan data provided information about the participants' KP membership history, hospitalization history, laboratory results, and other health conditions including cancer and diabetes. We used health plan data to validate specific survey data items when comparable health plan data existed.
We created new variables and categorical variables from the raw questionnaire data. Body mass index (BMI), a measure of obesity, was calculated as weight (kilograms) divided by the square of height (meters 2 ). Participants who did not complete all pages of the food frequency questionnaire (at least five items per page) or who had total calculated energy intake below (less than 800 kcal) or above (greater than 5,000 kcal) what we considered reasonable were excluded from the nutrient analyses. Recreational physical activity summary scores were derived by multiplying assigned MET values 13 by duration and frequency and summing across activities. We defined vigorous activities as participating in a minimum of 1,260 MET-hours per week, on average, equivalent to at least 3.5 hours of activity with a minimum MET level of 6. We defined moderate and vigorous activities combined as participating in a minimum of 630 MET-hours of activity per week, on average, equivalent to at least 3.5 hours of activity with a minimum MET level of 3.
We determined the expected number of cases of prostate, colorectal, lung, and bladder cancer and melanoma based on the expected baseline age distribution of the cohort of 84,170 men, and the age and race/ethnic specific annual (average 1992-1996) cancer incidence rates for the State of California. We account for an assumed 4% annual rate of loss to follow-up, due to mortality and loss to followup, based on our experience with other cohort studies.
We calculated percent agreement between BMI derived from self-reported weight and height and BMI from clinical databases. We cross tabulated the data grouped according to BMI categories of normal weight (<25), overweight (25)(26)(27)(28)(29), and obese (30 or higher), and then calculated the percent agreement as the total number of participants for whom BMI was classified in the same category by both sources divided by the total number of participants with clinical BMI data available.

Cohort characteristics
Response rates varied across mailing strategies (Table 1). Short questionnaire responses were substantially higher 1 Participants were mailed paper questionnaires but given the option to complete the questionnaires on a secure website 2 Participants were invited to complete the questionnaires on a secure website but offered the option to request a paper questionnaire when we mailed questionnaires (waves one and two) than when we invited the men to complete the questionnaire on-line (wave three). Conversely, among those who completed the short questionnaire, the responses to the long questionnaire were 50% higher among men recruited in wave three compared to men recruited in waves one and two. Overall, the participants completed nearly one-quarter of the short questionnaires and nearly one-third of the long questionnaires on the secure website.
A total of 84170 men joined the cohort, and nearly 40% were minority ( Table 2). Participants were distributed fairly evenly across age groups, although Latino, Asian, and 'other/mixed race/ethnicity' men tended to be somewhat younger, a reflection of the general KP membership age distribution. Educational attainment varied by race/ ethnicity, with over half of whites and Asians having earned at least a college degree compared to one-third of black participants and over one-fifth of Latinos. Income distributions also varied by race/ethnicity, with white participants reporting the highest and Latinos reporting the lowest incomes.
One in five participants had a history of BPH, a benign but frequently symptomatic prostate enlargement (Table 3).
Rates among white men were the highest and were more than 50% higher than those of Asian men who experi-enced the lowest rates. A history of prostatitis, a painful inflammation of the prostate usually due to infection, was also most common among white participants and more than twice as common as among Asians. Nearly half (48.6%) of the cohort reported moderate or severe LUTS, while 29.9% of the men reporting sometimes or never being able to get or keep an erection (e.g., erectile dysfunction).
Most participants reported having received PSA testing sometime in the past (Table 3). Black participants were most likely and Latino participants were least likely to have reported receiving a test. We observed only slight race/ethnic variation in the percentage of participants who received fasting glucose, HDL cholesterol, cholesterol panel, or triglyceride laboratory tests within the five years before cohort enrollment. In general, however, Asian participants were most likely and Latino participants were least likely to have received the tests.
We observed that energy and nutrient intakes varied somewhat by race/ethnicity (Table 4). Asian participants reported the lowest consumption of calories and percent calories from fat, fiber, protein and alcohol. Conversely, white participants reported the highest consumption of calories, fiber, and alcohol. Fruit consumption (servings) was similar across race/ethnic groups. The median BMI for the cohort was high with more than 50% of participants considered overweight (Table 4). Blacks and Latinos had the highest and Asians had the lowest median BMI. Nearly two-thirds of participants reported regular moderate or vigorous recreational activities and over 40% reported regular vigorous recreational activities, with the highest rates among white participants. Although smoking rates were low overall, rates among black participants, the highest, were more than double the rates of Asian participants, the lowest.
Over 5,000 participants identified themselves as other than heterosexual, with the highest proportion among white and the lowest proportion among Asian participants (Table 4).  We anticipate that nearly 1,500 prostate cancer diagnoses will occur among participants during the 5 years after cohort inception (Table 5). We also anticipate sufficient diagnoses of lung and colorectal cancers, and other diseases of similar incidence, within 5 years to conduct well powered analyses of etiological factors. There may also be sufficient bladder cancer and melanoma diagnoses to conduct basic etiological analyses.
In addition to cancer outcomes, a significant number of newly diagnosed cases of cardiovascular diseases and neurodegenerative diseases are expected. Based on estimates in the literature, estimates of the expected number of new cases in the CMHS participants after five years of followup (and adjusting for loss of follow-up and mortality) include: 5,000 new cases of myocardial infarction, 5,100 cases of congestive heart failure, 180 cases of Parkinson's disease, 7,000 cases of dementia, and 370 cases of Alzheimer's disease.
We compared CMHS participants with male participants in the California Health Interview Survey (CHIS), a population-based telephone survey of 55,000 California residents in 2001 [16] ( Table 6). Distributions of participants were similar for the demographic factors examined as well as for BMI. Substantially more CMHS participants had reported at least one PSA test than CHIS participants, probably due to complete insurance coverage among CMHS participants.
We also performed a small comparison of CMHS participants to non-participants of the same age range. A total of 84.4% CMHS participants were KP members for at least five years compared to 75.3% of non-participants. The percentage hospitalized during the 5 years before recruitment were similar for the two groups (17.0% of participants versus 15.6% of non-participants), and the prevalence of diabetes mellitus was nearly identical (13.5% for participants versus 13.6% for non-participants).
We validated BMI calculated from self-reported weight and height in relation to BMI reported in health plan clin-ical databases (Table 7). Overall, the percentage of participants who were classified into the same BMI categories by both sources was over 80%, and the percent of participants who were classified into the same or adjacent categories was 99.4%.

Discussion
We assembled a large, multiethnic male cohort from the membership of the largest managed care organization in California to study prostate cancer etiology and the etiologies of other cancer and non-cancerous conditions. CMHS participants are diverse in socioeconomic status as well as race/ethnicity, where both ends of the income and education spectrum are well represented in the cohort. The managed care setting is ideal to study disease risk factors and health services delivery where health care access issues are minimized and supplemental data sources are maximized and can be used to verify and extend the questionnaire data.
The availability of supplemental electronic data is a unique and important strength of this cohort. For example, we can conduct electronic data linkages between cohort and electronic health plan files to obtain current and historical data, obtain information on outcomes other than cancer, and conduct validation studies of selfreported data. Electronic data sources can be used to follow the participants for updated contact information from membership files, cancer diagnoses through cancer registry databases, and deaths of through linkage with health plan mortality databases derived from state death certificate files.
Another advantage of these data within this setting is the ability to assess the sensitivity and specificity of the selfreported data on health conditions and PSA testing by comparing these data to clinical databases of the health plan. The BMI from self-report was in high agreement with clinical BMI data, suggesting that data collection by self-report was reliable. However, a limitation of these analyses is the inability to identify diagnoses that predated the participant's health plan membership or the inception date of the electronic database. Another unique strength of this cohort is the potential to conduct efficient biospecimen-based research studies. The health plan maintains tissue banks of tumor and nontumor tissue blocks, some from as early at the 1950s, and the health plan manages a system of clinical laboratories that can be used for the efficient collection of blood specimens. The availability of these resources greatly facilitates research of genetic markers and other molecular studies.
Prospective cohort studies are considered the gold standard of epidemiologic research because many biases that can hamper interpretation of case-control studies are, by design, minimized in prospective cohort studies [17,18]. Cohort studies also facilitate the study of multiple outcomes, unlike other observational study designs. A draw-back to cohort studies is the long follow-up time often required to obtain sufficient power to conduct statistical analyses, typically low recruitment rates, and loss to follow-up that can harm internal validity. However, we anticipate having sufficient statistical power within five years of cohort inception to conduct prostate cancer analyses. Our cost-saving strategies of not re-contacting men who did not respond initially and of mailing the long questionnaire only to men who completed the 2-page questionnaire enhanced the likelihood of obtaining favorable follow-up rates and therefore high internal study validity, because the final cohort included only the men who were willing to complete two questionnaires separated in time.  The relatively low recruitment rates raise the issue of generalizability of the study findings to the broader community, although this aspect of this study is within the range that is found in other general cohort recruitment efforts. Prospective cohorts are considered valuable partly because well conducted and followed cohorts allow internally valid comparisons to be made [18][19][20][21], similar to clinical trials which are considered to be internally valid despite inclusion of typically highly selected participants [22]. No cohort is strictly generalizable to the population as a whole. Rather, research from some of the most widely cited cohorts is derived from study populations that are overwhelmingly from one race/ethnic group [23][24][25][26][27][28][29][30], occupational group or class group [23,24,26,27], or observational studies following clinical trials [31,32,33]. Nonetheless, the CMHS cohort was similar to the population of health plan members on important characteristics and appeared similar to men who responded a general health survey in California on a variety of important demographic and clinical characteristics. Thus, a cohort such as this one should be able to address biologically relevant questions related to disease onset and progression.
A wide variety of epidemiologic, health services, and outcomes research is possible within this multiethnic cohort. The cohort is available for study by non-Kaiser Permanente investigators and students conditional upon approval by the CMHS Proposal Review Committee. A rich array of electronic, biological, and clinical resources is available to supplement and enhance the survey data and to facilitate the study of outcomes other than cancer while controlling for major health risk factors. The large prospective cohorts in the United States have made enormous contributions to our knowledge of the causes of cancer and other diseases, and we believe the CMHS is poised to make important contributions in the area of men's health.