Participants
African Americans (N = 1,021), 683 women and 338 men, were recruited (2009–2010) to complete a telephone survey. Calls were made using a targeted list sample, created using random digit dial (RDD) generated lists matched to a market research data sample and developed to assure that major geographical regions were represented. In addition to this list, a separate RDD list was purchased and used in calling to reduce biases produced by a listed sample. The samples were drawn by Info USA, which is a company that specializes in developing targeted list samples for low-incidence populations. Eligibility criteria for participation included birth in the United States, self-identified African American male or female aged 50 to 75, a mailing address (for mailing of incentives), and a working telephone number completed a telephone survey.
Procedures
The Washington University in St. Louis Institutional Review Board approved this study and the consent procedures used. Listed individuals were contacted by phone via call center. Battelle Centers for Public Health Research and Evaluation trained callers, completed calling and survey administration. Telephone recruiters stated that researchers were recruiting participants for a study of attitudes that may relate to cancer screening, explained eligibility criteria, described the project, and encouraged eligible men and women to participate. If two eligible individuals resided at the residence associated with the telephone number, the Computer Assisted Telephone Interview (CATI) system used a pre-selected random number for the sampled household to determine the respondent. If more than two eligible individuals lived in the household, the most recent birthday, determined who was selected as the respondent and if the respondent was unable to give the birthday, first names were used to determine which eligible adult to select. Figure 1 provides a flow chart of the final study population.
Following eligibility screening, participants provided verbal consent and the survey was administered, which included CRCS and attitude items, cultural variables, and demographic information. The survey took approximately 35 minutes to complete by telephone. Five percent of participants (n = 50) were asked to consent to a re-administration necessary to establish test-retest reliability. Participants in the test-retest group received the follow-up call two weeks after completing the survey.
Measures
The NCI Self-Report Measure of CRCS was used to assess CRC screening behavior and family history [20]. Experience with three screening tests is assessed: fecal occult blood test (FOBT), sigmoidoscopy (SIG), and colonoscopy [21–24]. Current concordance estimates were all ≥ .80, as were sensitivity and specificity estimates [25]. Kappa statistics for FOBT and SIG were 0.71 and 0.73, respectively. The agreement for COL was almost perfect, 0.89, using Landis & Koch (1977) criteria [26]. Adherence to colorectal cancer screening was determined by classifying those who reported CRC screening by FOBT only within the last year, or SIG within the last five years, SIG within the last five years and FOBT in the last year, those reporting COL within the last seven to ten years (the actual screening interval may be adjusted based on the individual’s CRC risks) were coded as adherent (1); all others were coded as non-adherent (0). The classification criteria are consistent with US Preventive Services Task Force guidelines for CRCS [7].
McQueen’s 8-item perceived pros scale (alpha = 0.75) and 10-item cons scale (alpha = 0.78) were administered [27]. These measures were developed to measure participant perceptions of CRCS pros and cons and have been shown to be invariant across gender, race, and prior CRCS. Responses ranged from not important to very important. The pros and cons items provide information on the attitudes toward CRC and CRC screening among African Americans identified as an important component of TRA.
A 3-item validated scale to measure absolute perceived risk of CRC was administered [9]. Responses range from strongly disagree to strongly agree. In prior studies, the coefficient alpha was 0.79 in male auto-workers and 0.65 in a sample of black and white primary care patients. In addition to this scale, we included an item to assess participants’ comparative perceived risk relative to others their age and sex, which has been shown to be independent of, but positively associated with, absolute perceived risk for CRC.
Based upon TRA/TPB [28], an individual’s subjective norms reflect his/her beliefs about whether or not important referents approve or disapprove of the behavior and would encourage or discourage him/her to engage in CRCS, as well as motivation to comply with those referents. A positive association between subjective norms and engaging in CRCS has been noted [27]. A validated 4-item measure developed specifically for CRCS was administered to assess family and friends’ influence on CRCS [20]. Responses range from 1 = strongly disagree to 5 = strongly agree. Prior studies reported a coefficient alpha of 0.58 in white male auto workers and 0.61 in a clinic sample.
Individuals who feel confident in their ability to perform the required actions to complete CRCS are better able to overcome barriers and get CRCS. A validated 4-item measure of CRCS self-efficacy (alpha = 0.82) was administered [20]. Response options range from strongly disagree to strongly agree or not at all confident to very confident.
Cultural items addressed medical mistrust, fatalism, religiosity, spirituality, collectivism, communalism, racial and ethnic identity, and privacy. Religiosity/spirituality items addressed the internal manifestation of belief in a higher power and commitment to attendant values [29, 30]. Fatalism items focused on the belief that events are beyond an individual’s control [31]. In cancer research, cancer fatalism[32, 33] is defined as the belief that death is inevitable when cancer is present. Racial identification items referred to a psychological attachment to one of several social categories available to individuals, when the category selected is based on “race” or skin color, common history, nationality, culture, and ancestry [34]. Items covered the centrality, salience, and public and private regard of ethnic identity [35] and racial pride is an aspect of racial identification [36, 37]. Trust of the medical profession items addressed the belief that individuals and institutions will act appropriately and in a manner consistent with patients’ interests and included behavioral factors, such as the experience of discrimination [38]. Finally, collectivism items assessed the belief that one is linked with family and similar others and hold a cooperative attitude often leading to personal goals being subordinated to those of the group [39].
Data on age, education, income, occupational status and category, and marital status were collected. Items that addressed access to health care and usual source of care were taken from the 2005 National Health Interview Survey, Adult Access to Health Care and Utilization [40].
Statistical methods/analyses
Descriptive statistics (SPSS, version 17.0, Chicago, IL) were computed to describe the sample and provide scale means and standard deviations.
An associative data mining algorithm [15] was applied in order to explore unknown and potentially relevant relationships among variables found in this dataset. Before using the data mining algorithm, we first divided the dataset into two classes: C
1
: adherence (n
1
= 608); C
2
: non-adherence (n
2
= 411). Each option to one question is treated as an independent and distinct item. For example, if there are four options to one question (Q
1), then four items (i
1
, i
2
, i
3
, i
4
) with a distinct identification code are generated for Q
1. Therefore, after this pre-processing, each participant record is represented by a set of disjoint coded items. The complete list of items is considered as candidate variables, which, later on, are used as the input of this associative mining algorithm.
The first step of this algorithm is to identify all frequent itemsets for each group, which are the combination of disjoint items, by calculating support values. Based on the number of unique items in each dataset, there are around 5 choices per question (720 unique items, 144 questions). If an exhaustive approach is applied to search all possible itemsets, (5
144) combinations could be generated as potential frequent itemsets. We applied the traditional Apriori algorithm [41] on a Hadoop [42] cluster using Spark [43] to streamline the frequent itemset extraction process. The support threshold for each group is set to 0.6, meaning each discovered frequent itemset had occurred in at least 60% of participant records in each adherent or non-adherent group. This value was empirically chosen after multiple runs of the data with varying support on intervals of 10%. Supports below 60% generated too many itemsets, which are indicative of the population as a whole (non-descriptive). In contrast, supports above 60% filtered valuable discoveries. The support value of itemset (i
1
, i
2
) is defined as:
Once the itemset is frequent, the algorithm will start to calculate the confidence value in order to decide whether it is a significant association rule R for a specific adherent group: {i
1
, i
2
} → C
1
:
The frequent itemsets were then filtered so that only maximum supersets remained. In order to find class-specific rules, two methods were used to find itemsets which could be used in a clinical setting, both of which are based on contrast set mining [44]:
-
(1)
M1: identifying frequent itemsets that are shared by both adherent groups (C
1
and C
2
) with a significant support difference (at least 20%) between the two groups. {Itemsets(M1) | S
k
ı (Itemsets (C
1
) ∩ Itemsets (C
2
)), where |Support (S
k
,C1)- Support (S
k
,C2)| ≥20%}.
-
(2)
M2: identifying items (Ii’s) that are part of a frequent itemset S
k,C1
= {(I
i
’s) (I
j
’s)} in only one of the adherent groups (C
1
) and a subset of the frequent itemset (I
j
’s) is also a frequent itemset in another adherence group (C2). This method assists us to find attributes which are strongly shared between groups, but when extra attributes are added, it becomes skewed towards one class or the other.
The findings from both methods were then fed into SPSS (17.0) to perform logistic regression analyses to ensure statistical significance.