Study Cohort and Sample Selection
Complete documentation of the design, procedures, and protocols for the AHEAD cohort is available online at http://hrsonline.isr.umich.edu or elsewhere . We used weighted analyses to adjust for the unequal probabilities of selection that over-sampling African Americans, Hispanics, and Floridians created. An 80.4% response rate was obtained for the baseline (1993-1994) interviews, yielding 7,447 participants ≥ 70 years old (i.e., the 775 spouses < 70 years old were excluded).
An aggressive approach to maximize the completion of the biennial follow-up intervals was used (see http://hrsonline.isr.umich.edu/sitedocs/sampleresponse.pdf for specific details, including the number of participants eligible, re-interviewed, or lost to follow-up at each stage of the study). Basically, at each biennial follow-up, multiple attempts were made to re-interview all surviving participants, preferably as self-respondents, although proxy-interviews were allowed (proxy-respondent rates rose from 10.6% at baseline to 17.2% by 2006). This included re-interviews with participants who were in a nursing home at the time of the scheduled follow-up interview. The only exception to this aggressive follow-up protocol was for the 269 participants who at some point between 1995 and 2007 asked never to be contacted again. Among those who were lost to follow-up at any given biennial interview, an average of 5.5 re-interview attempts were made. The response rate among survivors at the biennial follow-up interviews ranged from a low of 93.8% in 1995-1996 to a high of 95.9% in 2006-2007, with 81.8% of the participants re-interviewed at every possible re-interview round for which they were eligible.
We limited the analytic sample to a maximum of 4,251 participants (56.5% of the total age-eligible cohort). First, we excluded 523 (7.0%) for whom proxies provided data at baseline because they were not asked the same cognitive status measures, and thus changes in cognitive function could not be assessed for them. Second, 786 (10.6%) were excluded because their baseline interviews could not be linked to their Medicare claims, and a focal interest was the association of hospitalization patterns with changes in cognitive function. Third, 610 (8.2%) who were in managed Medicare at baseline were excluded because managed care plans have different claims reporting requirements for Part B (non-institutional) claims and it is generally recommended that claims-based analyses should be limited to beneficiaries in fee-for-service Medicare . Fourth, we excluded 1,077 (14.5%) because they did not have any post-baseline follow-up interviews as self-respondents, and thus changes in cognitive function could not be assessed for them. Finally, missing data on the outcome measures further reduced the analytic samples to 3,021 for the TICS-7, to 4,251 for the immediate word recall test, and to 4,068 for the delayed word recall test, reflecting higher refusal rates on the TICS-7 over time. Included participants were censored for three competing risks--death, enrolling into managed Medicare, or the 2006-2007 follow-up interviews--whichever came first.
To gauge the extent of selection bias based on these five exclusion criteria, we conducted a series of one-way analyses of variance on selected demographic (age, sex, and race), socioeconomic (education and income), and functional status (limitations in performing activities of daily living [ADLs], instrumental ADLs [IADLs], and mobility) characteristics. We constructed a categorical (i.e., nominal) measure reflecting the six possible participant groups including those excluded due to proxy-interviews at baseline, inability to link to Medicare claims, participation in managed Medicare plans, not having any follow-up interviews as a self-respondent, and refusing to answer any of the cognitive function assessments vs. being included in the analytic sample. In the Dunnett post-hoc test comparisons, participants included in the analytic sample were designated as the reference (or comparison) group. As expected, those results (not shown) indicated that participants in the analytic sample were younger, less likely to be a man or a member of a racial minority, and had more education, better income, and better functional status. Thus, our analytic sample was considerably advantaged relative to the members of the AHEAD cohort who were excluded from the analysis, which biases our prevalence estimates for cognitive decline downward and our estimated associated risks toward the null.
Cognitive function was measured using three standard assessments taken from the TICS, which originally had 11 components (TICS-11) . The TICS-11 was developed for use where a standard cognitive screening test like the Mini-Mental Status Examination (MMSE)  could not be used, because the MMSE requires a face-to-face interview for visually presented object naming, pentagon drawing, and paper folding tasks. Thus, the MMSE and similar assessments are not usable in studies like the AHEAD that rely on telephone interviews. The reliability and validity of the TICS-11 and its comparability to the MMSE has been consistently demonstrated .
At baseline, AHEAD participants were asked the following questions in sequence . The first was the immediate word recall test, which taps episodic verbal memory (i.e., working memory, fluid intelligence, or explicit memory). Participants were read a set of 10 words (lake, car, army, forest, ticket, bird, winter, door, mountain, and plant) and then asked to recall as many as they could over the next two minutes. Their score was the number of words (0-10) correctly recalled. The second assessment (TICS-7) asked participants to: (1) correctly identify the month, day, year, and day of the week (0-4 points); (2) count backwards from 20 (2 points if correct on the first try, 1 point if correct on the second try); (3) name a tool used to cut paper (scissors; 1 point); (4) name a prickly plant found in the desert (cactus; 1 point); (5) name the current President of the U.S. (1 point); (6) name the current Vice President of the U.S. (1 point); and, (7) serially subtract 7 from 100 five times (0-5 points). TICS-7 scores ranged from a low of 0 correct answers to a high of 15 correct answers. The third assessment was the delayed word recall test, which also taps episodic memory (i.e., acquired information, and processed and stored memory retrieval). Here again, the score was the number of words (0-10) correctly recalled over the next two minutes.
The initial psychometric reports from the AHEAD cohort  combined the three cognitive function assessments into a summary measure ranging from 0 to 35 (TICS-9), as did many subsequent analyses . We, however, treated the TICS-7, and the immediate and delayed word recall tests as three separate measures. The reason was that both exploratory and confirmatory factor analyses have consistently shown that the TICS-7 items and the immediate and delayed word recall tests load on two separate factors, providing strong evidence of the multidimensionality of the underlying latent variable--cognitive function . All of the TICS-7 items loaded principally on the first factor, which clearly reflected general mental status. In contrast, both the immediate and delayed word recall tasks loaded on a second factor that clearly reflected memory. Furthermore, the immediate word recall test measured working memory, fluid intelligence, or explicit memory, whereas the delayed word recall test was generally viewed as a measure of acquired information, and processed and stored memory retrieval, with the latter being a more demanding task  and a better prognostic indicator of dementia risk .
Post-baseline, modest changes in the three cognitive assessments occurred to minimize learning (or practice) curves associated with prior administration. The immediate and delayed word recall tests were modified in that four alternative 10-word lists were used, with random assignment to one of the alternative lists at each subsequent re-interview. Using alternative forms of word recall lists in this fashion minimizes learning (practice) curves and results in highly reliable alternative forms reliability, with coefficients ranging from 0.67 to 0.90 [19, 20]. The starting number for counting backwards was also changed, and alternative objects were substituted for the scissors and cactus naming questions.
Exposure and Censoring Adjustment
The time periods between the baseline and last follow-up interview were not uniform, with participants having 2-12 years between their baseline and last follow-up interviews. To adjust for this differential exposure (aging effects) to cognitive change we included the number of years (centred at the mean) from the baseline interview to the last follow-up completed as a self-respondent. The quadratic of the centred exposure was also included to test for nonlinearity. Exposure time could be censored based on two criteria. The first was due to entering Medicare managed care plans. To adjust for this we created a single binary indicator reflecting censoring due to entering managed care, regardless of vital status at the end of the study period (January 2008). The second criterion was vital status. To adjust for this we included a set of three binary indicators reflecting surviving until the end of the study period, dying within one year after the last completed follow-up (reflecting terminal decline) [21, 22], or dying more than one year after the last completed follow-up (reflecting delayed deaths) as the reference group.
Potential risks for changes in cognitive function can be categorized into demographic and socioeconomic characteristics, disease history, health lifestyles and functional status [5–12], all of which were assessed at baseline. Although some of these characteristics, especially functional status, may have changed after baseline, we took a static approach to their assessment because the focal variables among them are generally fixed, and not all participants in the analytic sample were continuously re-interviewed as self-respondents. This approach likely credits any associations in adverse changes in these characteristics with changes in cognitive function to the aging coefficients, potentially resulting in their overestimation. To these we added post-baseline continuity of care, given its beneficial role in primary care for older adults , as well as post-baseline "health shocks" indexed somewhat by physician visits but mostly by hospitalizations [24, 25].
Demographic and socioeconomic factors included age, sex, race, marital status, education, and income. Disease history was measured by a set of 10 binary indicators for whether the participant reported having been told by a physician that she had the particular diseases, and a binary indicator of ≥ 3 of the diseases to tap comorbidity. Body mass index, engaging in vigorous physical exercise, cigarette smoking, and alcohol consumption was used to measure health lifestyles. Functional status included measures of ADLs, IADLs, mobility, vision, hearing, and depressive symptoms . We used a previously validated measure of continuity of care with a primary care physician [26–28] to calculate the percentage of days between baseline and the last follow-up interview for which continuity of care existed (no more than an 8-month interval between office visits to the same primary care physician), and converted it to two binary indicators reflecting the extremes of always or never having continuity.
Using the Medicare claims Carrier Statistical Analytic File (SAF) and standard accounting methods we summed post-baseline primary care physician visits across the observation period, and divided this sum by the number of follow-up years. This distribution was then categorized into a set of three binary indicators reflecting the lowest, second, and third vs. the highest quartiles. We measured post-baseline hospital episodes in a similar fashion, with this distribution also categorized into a set of three binary indicators reflecting the lowest (none), second, and third vs. the highest quartiles.
There are two common multivariable linear regression approaches to modelling changes in metric (interval level of measurement) outcomes like cognitive function [29–33]. One approach, often referred to as the "delta" approach, involves the use of change (or gain) scores, where the dependent variable is the difference (delta) between the baseline and follow-up values (i.e., time2 - time1) on the outcome of interest. The other approach, often referred to as the "residual change score" approach, involves having the follow-up value on the outcome of interest predicted by the baseline value, as well as other risk factors (or covariates). For the purpose of this study, the residual change score approach was more appropriate.
In its simplest form, the residual change score approach involves using a multivariable linear regression modelling approach that may be depicted as:
where C is the cognitive function measure (either the TICS-7, or immediate or delayed word recall assessments), the subscripts refer to t
(the final follow-up) and t
(the baseline interview), a is the intercept, b
is the effect of the baseline cognitive function measure on the cognitive function measure at the final interview, X is a vector of other risk factors, b
is a vector of coefficients for the X vector of other risk factors, and e is the error term. Values of b
, the effect of the baseline cognitive function measure, that are less than 1 reflect regression to the mean, whereas values greater than 1 reflect increasing inequalities (i.e., accumulative advantage and/or accumulative disadvantage), with values of 1 reflecting stability. The b
vector directly represents the effects of the X vector of other risk factors on changes in cognitive function. Besides its straightforward nature and interpretation, the advantage of the residual change score regression approach in the current context was that it avoided the potential bias of producing spurious correlations of risk factors that may occur in the delta approach when the risk factors are associated with the baseline value of the outcome, which is sometimes referred to as horse-racing bias [34, 35], and Lord's paradox in which the delta approach fails to identify associations evident from residual change score analysis of the same data [36–39].
Statistical assumptions about the regression models, including additivity of the effects, were evaluated using standard procedures. Given the large number of potential risk factors, we tested for multicollinearity, but found no problems (all VIFs [variance inflation factors] were below 2.9). Nonetheless, as an added safeguard stepwise and backward model trimming procedures were also used, but yielded equivalent results . The majority of the missing data occurred for the dependent variables due to the use of proxy-respondents at follow-up (see study cohort and sample selection, above). The minimal amount of missing data on the risk factors (independent variables) was addressed using list-wise deletion.
Human subject approvals
The research reported here was supported by a USA National Institutes of Health (NIH) grant (R01 AG022913) to Dr Wolinsky. The research and restricted data protection plans associated with this grant were approved by AHEAD on February 20, 2003 (#2003-006). Furthermore, the human subject protocol for this USA NIH grant was fully approved by University of Iowa Institutional Review Board (IRB-01) on March 24, 2003, and was fully approved again by IRB-01 at each of its annual reviews, the most recent of which occurred on June 7, 2011. A Data Use Agreement (DUA) with the USA Centres for Medicare and Medicaid Services (CMS; DUA 14807) for this study was fully approved on March 3, 2005. Written informed consent was obtained.