This article has Open Peer Review reports available.
Reproducibility of physical activity recall over fifteen years: longitudinal evidence from the CARDIA study
© Smith et al.; licensee BioMed Central Ltd. 2013
Received: 28 September 2012
Accepted: 14 February 2013
Published: 28 February 2013
To examine the benefits of physical activity (PA) on diseases with a long developmental period, it is important to determine reliability of long-term PA recall.
We investigated 15-year reproducibility of PA recall. Participants were 3605 White and African-American adults in the Coronary Artery Risk Development in Young Adults study, aged 33–45 at the time of recall assessment. Categorical questions assessed PA before and during high school (HS) and overall PA level at Baseline, with the same timeframes recalled 15 years later. Moderate- and vigorous-intensity scores were calculated from reported months of participation in specific activities.
HS PA recall had higher reproducibility than overall PA recall (weighted kappa = 0.43 vs. 0.21). Correlations between 15-year recall and Baseline reports of PA were r = 0.29 for moderate-intensity scores, and r = 0.50 for vigorous-intensity. Recall of vigorous activities had higher reproducibility than moderate-intensity activities. Regardless of number of months originally reported for specific activities, most participants recalled either no activity or activity during all 12 months.
PA recall from the distant past is moderately reproducible, but poor at the individual level, among young and middle aged adults.
Abundant research demonstrates inverse associations between physical activity (PA) level and chronic disease risks, including cardiovascular disease, diabetes, hypertension and certain cancers . Although clinical manifestations of diseases such as breast cancer do not typically become evident until middle age or older, the pathological etiology may be associated with PA during young adulthood or childhood . It is therefore important to reliably and validly assess participation in PA over the lifespan to fully examine relationships between PA and chronic disease development.
In epidemiological studies that follow participants from late adulthood, PA participation over the lifespan may be assessed by recall of historical PA. Historical PA is defined as activity engaged in more than one year before the assessment [3–6]. Studies have correlated historical PA with: cardiorespiratory fitness and health markers documented in medical records ; personal reports to a physician at a prior point in time , and current health markers . However, time constraints and lack of criterion measures are challenges to assessing the validity of historical PA recall.
This analysis extends previous research from the Coronary Artery Risk Development in Young Adults (CARDIA) study  by analyzing the reproducibility of historical PA recall for the entire CARDIA sample, and examining differences in reproducibility by age, sex, and race subgroups. The time period between historical PA recalls was 15 years, which may approximate a period of time relevant to studies of cancer and other chronic diseases. Specific objectives were to: (1) examine the reproducibility of reporting PA occurring during adolescence and young adulthood; (2) assess reproducibility differences for reporting PA frequency, type, and intensity; (3) identify demographic characteristics associated with reproducibility differences.
CARDIA is a longitudinal, population-based cohort study examining determinants of coronary artery disease risk factors in young adults. The cohort included 5115 adults. There were 5045 males and females aged 18–30 years (those <18 were excluded), with approximately equal distribution by race (black and white), sex, education (high school or less, and more than high school) and age (18–24 and 25–30). Baseline assessment occurred in Year 0 (1985–1986), with follow-up examinations conducted in Years 2, 5, 7, 10, 15, and 20. Details on CARDIA have been summarized previously . Participants provided written informed consent at each examination, and institutional review boards at each field center (University of Alabama at Birmingham; Northwestern University; University of Minnesota; and Oakland, California Kaiser Permanente) and at the coordinating center (University of Alabama at Birmingham) approved the study annually. The analytic sample includes 3605 participants who provided complete demographic and PA data at Year 0 and Year 15.
Physical activity measurement
PA outcomes included categorical ratings and continuous scores. Categorical questions assessed PA before and during high school (HS) and overall PA level during the year before the interview. Moderate- and vigorous-intensity scores were calculated from reported months of participation in specific activities. PA information was obtained for each participant at different time points: (a) Baseline PA (at Year 0), (b) Current PA (at Year 15), and (c) Historical PA (Recall of Baseline at Year 15).
Assessment of baseline physical activity at year 0
During the Year 0 exam, participants reported their overall PA level on a 5-point ordinal scale (‘inactive’ to ‘very active’) at specific times in life: (a) before HS, (b) during HS, and (c) during the 12 months before the exam. Respondents also reported their Baseline PA participation in thirteen specific moderate- and vigorous-intensity leisure time activities over the prior year: whether they had engaged in each activity (yes/no), for how many months (0–12), and how many of the months activities were performed for an activity-specific long duration. (See Sample Item for example survey questions on running.) Based on the CARDIA PA assessment protocol , questionnaire responses for the specific activities were also combined to form continuous moderate and vigorous-intensity leisure Baseline PA scores. Moderate, vigorous, and total scores were calculated as the sum of each activity MET level × months of infrequent activity plus 3 × months of frequent activity.
Did you jog or run in the 12 months before your first CARDIA exam for at least one hour total time in any month? For instance, you might have done three 20-minute sessions in the month.
How many months did you do this activity?
How many of these months did you do this activity for at least 2 hours per week?
Assessment of current and historical physical activity at year 15
In the Year 15 examination (conducted in 2000–2001) participants first reported their current PA by reporting moderate- and vigorous-intensity activities over the prior year using the same interview-administered instrument as in the Year 0 exam. After reporting their current PA, the interviewer provided prompts to help participants focus on the time period relevant to their Year 0 exam. First, they were primed with information about the month and year of their Year 0 exam visit, and about both their age and the President of the United States at that time. Additionally, each study center could mention a local event that had occurred at the time of the Year 0 exam. Participants were also asked questions about their living situation, job, school, marital status and whether they had children at that time. Following these reminders, and using the same questionnaire administered at Year 0, participants recalled their historical PA at Baseline by reporting their participation in the thirteen leisure time activities, including the number of months they had participated in each activity. They also reported their overall PA level before HS, during HS, and during the 12 months before the Year 0 exam, using the same 5-point ordinal scale used at Year 0 .
PA at Baseline (Year 0), Current (past year at Year 15) and Historical (Recall of Baseline at Year 15) data enabled several levels of analysis: (1) whether participants could consistently classify their activity levels at particular times in their lives (e.g. HS); (2) whether they could consistently report participation in a particular activity (yes/no) in the past; (3) whether they could consistently report the number of months they participated in a particular activity in the past; and (4) whether demographic characteristics explained differences between Baseline and Historical PA reports.
We compared the distribution of demographics, body mass index (BMI), and PA score for the analytic sample at Year 0 and Year 15 with the entire enrolled CARDIA sample. We then examined reproducibility of Baseline and Historical ordinal categories of past year, HS and pre-HS activity levels with weighted kappa statistics and percent agreement. To examine whether participants tended to report Historical PA that reflected Current PA, agreement between Historical PA and Current PA was tested with percent agreement and weighted kappa statistics. Reproducibility of activity type was examined with percent agreement and kappa statistics. To explore whether participants could accurately reproduce the number of months they participated in a particular activity, we plotted the number of months they recalled doing specific activities against the number of months reported at Baseline. The number of months was collapsed into 3-month ranges to simplify presentation. For participants who reported that they did not engage in a particular activity, number of months was coded as 0. Subsequent analyses examined Baseline and Historical leisure PA scores for moderate- and vigorous-intensity activities using Spearman correlations. Similar to the categorical reports, we also examined correlations between Current PA scores and Historical PA scores. To determine correlates of reproducibility, we examined differences between Baseline PA and Historical PA scores in regression models. Covariates included Baseline activity levels (in quartiles), demographic and behavioral characteristics, and BMI. Discrepancies were examined by subtracting Baseline moderate- and vigorous-intensity activity scores from Historical scores; thus, positive scores indicate over-reporting at Year 15 compared to Year 0 and negative scores indicate under-reporting at Year 15 compared to Year 0.
Demographic characteristics of the CARDIA sample at Year 0 and the analytic sample at year 0 and year 15
At Year 0
At Year 0
At Year 15
Age at Year 0
High School or Less
Body Mass Index
Normal weight (<25 kg/m2)
Overweight (25–29 kg/m2)
Obese (≥30 kg/m2)
Physical Activity Score a,b
474 (272, 726)
444 (251, 708)
356 (176, 622)
228 (103, 396)
220 (100, 388)
173 (70, 327)
461 (288, 672)
468 (290, 684)
372 (208, 570)
351 (208, 543)
364 (214, 548)
278 (137, 454)
Fifteen-year reproducibility of self-rated physical activity levels in adolescence and early adulthood between baseline and historical recall (n = 3,565)
Self-rated physical activity
Baseline physical activitya
Historical physical activitya
Current physical activitya
Before high school
During high school
15-year reproducibility of reported types of activities between baseline and historical recall (n = 3,565)
Baseline recall/Historical recall*
Home exercise / Calisthenics
Walk or hike
Bowl or golf
Gardening / Home maintenance
Home activity (shoveling, weight lifting)
Job activity (lifting, carrying)
Factors associated with adjusted mean differences between physical activity scores reported at baseline and recalled historically, stratified by intensity (n = 3,561)
Moderate-intensity activity score difference
Vigorous-intensity activity score difference
Least squares meanb
Least squares meanb
Baseline physical activity scorea
Age at Year 0
Education at Year 15d
High school or Less
Body Mass Index at Year 15d
Normal weight (<25 kg/m2)
Overweight (25–29 kg/m2)
Obese (≥30 kg/m2)
Sex & Race Interactionc
Race & Education Interactionc
Black, High School or Less
Black, Some College
Black, College Graduate
White, High School or Less
White, Some College
White, College Graduate
PA Level & Age at Year 0 Interactionc
1st quartile, 18–25 years
1st quartile, 26–34 years
2nd quartile, 18–25 years
2nd quartile, 26–34 years
3rd quartile, 18–25 years
3rd quartile, 26–34 years
4th quartile, 18–25 years
4th quartile, 26–34 years
Accurate and reliable PA assessment is essential for epidemiologists, exercise scientists, clinicians, and behavioral researchers. Recently, objective measures of PA, such as accelerometers, have dominated the public health literature . Objective devices may assess current behavior well; however they cannot provide information about PA from the distant past. To understand the influence of historical PA on chronic disease risk, it is important to know whether PA can be reasonably recalled over the long-term and to assess demographic, social, and behavioral factors that may affect recall.
Results of our comparison of PA recalled over 15 years to original reports varied by both the type of survey question and the type of information obtained. Respondents were able to reproduce categorical ratings of overall PA level (e.g., ‘very active’) reasonably well, particularly for a well-defined and significant time such as during HS (percent agreement >50% and Kappa = 0.43). As displayed in Table 2, percent agreement and Kappa were more modest for periods that were likely to be less memorable, such as before HS or the year before entry into the CARDIA study. It is also possible that HS activity was recalled well because it is a period that may be recalled and reported more often than less well-defined periods. Agreement for reports of whether respondents participated in specific activities was also reasonably reliable (see Table 3), particularly for vigorous-intensity activities (percent agreement 64-79% and Kappa 0.28-0.48). However, in some cases, high agreement may be due to the low participation rate for these activities (e.g., bowling/golf), so that one agreement category (No/No) is very large. As a result, it is not possible to determine which specific activities can be recalled best. When more quantification was involved, such as estimating the number of participation months for an activity, agreement was negatively affected by a combination of a propensity to exclude activities in the Historical report, as well as response clustering at 0 and 12 months of participation, as shown in Figures 1 and 2. It appeared that long-term recall led to a loss of time resolution, with a tendency toward all-or-none estimation, and perhaps some “splitting the difference” by estimating a value of 6 months.
In general, our data showed that PA recall consistency over fifteen years among young and middle-aged adults was generally modest, but comparable to studies of similar and longer duration [14–19]. However, an important observation is that even when overall agreement was reasonably good for long term recall studies of this type -- such as the correlation of 0.50 for the vigorous activity score -- error at the individual level was quite large. As shown in Figure 3, for a vigorous-intensity score of 500 at Baseline, the Historical scores ranged from 0 to 1500. This substantial error at the individual level will likely reduce the researcher’s ability to detect relationships between historical activity and outcomes within individuals. However, on the positive side, in contrast to Falkner et al. , historical recall in this study appeared to reflect actual recall, rather than current activity. With the exception of the Moderate Activity Score, for which agreement of Historical scores with Current score was similar to agreement with Baseline score, Historical reports were more similar to Baseline than they were to Current reports of PA.
The large CARDIA cohort allowed examination of demographic and other predictors of reporting discrepancy. Generally, the Historical score was higher than Baseline, particularly for vigorous-intensity scores. These results indicate over-reporting of recalled physical activity and are consistent with previous studies [16, 18]. Demographic characteristics including BMI, race and education were significantly associated with discrepancies in recall. A race by education interaction reflected the unexpected finding that over-reporting increased with higher levels of education among black participants. Our results also indicated an interaction of sex and race with the Baseline activity level that highlighted over-reporting by men, particularly black men. These results indicate that demographic factors need to be taken into consideration when pursuing studies of physical activity recall, and specific examination by subgroups should be considered. However, in our study, the recall discrepancy was also a function of Baseline activity scores, with the most active participants at Baseline likely to produce Historical activity scores lower than Baseline. Because true activity at the recalled time period is usually not available, it may be difficult to account for this source of error in studies that use recalled activity.
Overall, the current study adds to a growing body of research on long-term PA recall. These studies are important, as researchers increasingly use historical and lifetime measures to examine exposure to PA over the life course in relation to health outcomes [5, 6, 8, 16]. A major limitation of methodological studies of long-term recall PA instruments is that validity can rarely be established due to the lack of criterion measures at the period(s) being reported. That leaves reliability or reproducibility of reports as the primary indicator of instrument quality. Reliability has generally not been examined relative to the actual period of interest. Many studies look at reproducibility of lifetime reports over 3–10 week periods [4, 8, 21], or up to one year . These studies show that the reproducibility of self-reported lifetime activity recalled over short periods range from r = 0.53 to 0.85. Studies of specific activities recalled over longer periods (10–36 years) have shown weaker associations; correlations and kappa statistics range from 0.09-0.52 [14–19]. However, recall accuracy may have been affected by participants’ age, which varied from middle to older age.
This study has several notable strengths. Participants recalled activity over a long period of time (15 years) with the same instrument that was originally used. Other studies have used different instruments at two points in time, which has made interpretation and comparison of findings difficult . The CARDIA cohort provided a large and diverse sample that allowed examination of factors related to the difference between Historical and Baseline reports. The current study focused on early adulthood and included questions about activity during adolescence. The Historical recall in CARDIA provides relevant data for studies of PA exposure in early life and later health outcomes.
There were also limitations to our study. We cannot determine whether PA recall reflected what participants were actually doing at the time of the Baseline exam, but rather what they reported doing. There are also potential limitations of generalizability. This study relied upon a single questionnaire in a cohort of black and white young and middle aged adults. Long term reproducibility is, at least in part, a function of the reliability of the questionnaire. The CARDIA questionnaire has been shown to have good reliability over two week retest (r ~ 0.80) . It is not clear how well our results will generalize to other ethnic groups, older age groups, or to studies that use different questionnaires. Additionally, we cannot estimate the effect of participant dropout from CARDIA between Years 0 and 15.
However, our results provide an important step in understanding historical PA recall and have implications for future studies. For example, for investigators assembling retrospective cohort studies in which PA is used as an exposure variable, our data suggest that individuals do well at classifying their activity level with categorical questions, particularly for memorable life periods. Categorical responses can be important indicators; for example, a five-category single-item general health question has been shown to be related to health outcomes . Seeking more quantitative precision in Historical recalls, on the other hand, may not be productive; our data showed that there was not much precision in participants’ ability to recall the amount of time over the course of a year for a given activity. These results provide important information when considering the kinds of questions that may be reasonably asked. In future research, investigators may want to consider whether the additional participant burden of asking about details regarding specific components of activity such as duration and frequency adds predictive value.
The current study was able to expand on previous research by examining different components of PA recall. This study characterized novel, systematic patterns, such as the clustering of number of months of the year of reported PA. It also showed that participants were better able to recall vigorous-intensity activities, and could accurately reproduce their activity levels during salient times of life, such as HS. Overall, these data suggest that historical PA recall over 15 years is only modestly reproducible and poor at the individual level among young and middle aged adults. Researchers should consider these limitations when undertaking studies that require assessment of PA in the distant past.
This work was supported by the National Cancer Institute, contract number: Y2-PC-0010-DC and by the National Heart, Lung and Blood Institute, CARDIA contract numbers: N01-HC-48047 – N01-HC-48050 and N01-HC-95095.
The authors thank Dr. Laurence Freedman for statistical advice and Lisa Kahle for programming support.
- Physical Activity Guidelines Advisory Committee: Physical Activity Guidelines Advisory Committee Report, 2008. 2008, Washington, DC: U.S. Department of Health and Human ServicesGoogle Scholar
- Hallal PC, Victora CG, Azevedo MR, Wells JC: Adolescent physical activity and health: a systematic review. Sports Med. 2006, 36: 1019-1030. 10.2165/00007256-200636120-00003.View ArticlePubMedGoogle Scholar
- Bowles HR, FitzGerald SJ, Morrow JR, Jackson AW, Blair SN: Construct validity of self-reported historical physical activity. Am J Epidemiol. 2004, 160: 279-286. 10.1093/aje/kwh209.View ArticlePubMedGoogle Scholar
- Friedenreich CM, Courneya KS, Bryant HE: The lifetime total physical activity questionnaire: development and reliability. Med Sci Sports Exerc. 1998, 30: 266-274.View ArticlePubMedGoogle Scholar
- Friedenreich CM, Bryant HE, Courneya KS: Case–control study of lifetime physical activity and breast cancer risk. Am J Epidemiol. 2001, 154: 336-347. 10.1093/aje/154.4.336.View ArticlePubMedGoogle Scholar
- Kriska A, Knowler W, LaPorte R, Drash A, Wing R, Blair S, Bennett P, Kuller L: Development of questionnaire to examine relationship of physical activity and diabetes in Pima Indians. Diabetes Care. 1990, 13: 401-411. 10.2337/diacare.13.4.401.View ArticlePubMedGoogle Scholar
- Kohl HW, Kampert JB, Masse LC, Fulton JE, Tortolero SR, Blair SN: The accuracy of historical physical activity recall among middle-aged women and men [abstract]. Med Sci Sports Exerc. 1997, 29: S42-Google Scholar
- Kriska A, Sandler R, Cauley J, LaPorte R, Hom D, Pambianco G: The assessment of historical physical activity and its relation to adult bone parameters. Am J Epidemiol. 1988, 127: 1053-1063.PubMedGoogle Scholar
- Slattery ML, Jacobs DR: Assessment of ability to recall physical activity of several years ago. Ann Epidemiol. 1995, 5: 292-296. 10.1016/1047-2797(94)00095-B.View ArticlePubMedGoogle Scholar
- Cutter GR, Burke GL, Dyer AR, Friedman GD, Hilner JE, Hughes GH, Hulley SB, Jacobs DR, Liu K, Manolio TA: Cardiovascular risk factors in young adults. The CARDIA baseline monograph. Control Clin Trials. 1991, 12: 1S-77S.View ArticlePubMedGoogle Scholar
- Jacobs JR, Hahn LP, Haskell WL, Pirie P, Sidney S: Validity and reliability of short physical activity history: CARDIA and the Minnesota Heart Health Program. J Cardiopulm Rehabil. 1989, 9: 448-459. 10.1097/00008483-198911000-00003.View ArticleGoogle Scholar
- Sidney S, Jacobs DR, Haskell WL, Armstrong MA, Dimicco A, Oberman A, Savage PJ, Slattery ML, Sternfeld B, Van Horn L: Comparison of two methods of assessing physical activity in the coronary artery risk development in young adults (CARDIA) study. Am J Epidemiol. 1991, 133: 1231-1245.PubMedGoogle Scholar
- Crowne DP, Marlowe D: The Approval Motive: Studies in Evaluative Dependence. 1964, New York: John Wiley & Sons, Inc.Google Scholar
- Blair SN, Dowda M, Pate RR, Kronenfeld J, Howe HG, Parker G, Blair A, Fridinger F: Reliability of long-term recall of participation in physical activity by middle-aged men and women. Am J Epidemiol. 1991, 133: 266-275.PubMedGoogle Scholar
- Falkner KL, McCann SE, Trevisan M: Participant characteristics and quality of recall of physical activity in the distant past. Am J Epidemiol. 2001, 154: 865-872. 10.1093/aje/154.9.865.View ArticlePubMedGoogle Scholar
- Fransson E, Knutsson A, Westerholm P, Alfredsson L: Indications of recall bias found in a retrospective study of physical activity and myocardial infarction. J Clin Epidemiol. 2008, 61: 840-847. 10.1016/j.jclinepi.2007.09.004.View ArticlePubMedGoogle Scholar
- Lee MM, Whittemore AS, Lung DL: Reliability of recalled physical activity, cigarette smoking, and alcohol consumption. Ann Epidemiol. 1992, 2: 705-714. 10.1016/1047-2797(92)90015-I.View ArticlePubMedGoogle Scholar
- Lissner L, Potischman N, Troiano R, Bengtsson C: Recall of physical activity in the distant past: the 32-year follow-up of the prospective population study of women in Goteborg, Sweden. Am J Epidemiol. 2004, 159: 304-307. 10.1093/aje/kwh048.View ArticlePubMedGoogle Scholar
- Winters-Hart CS, Brach JS, Storti KL, Trauth JM, Kriska AM: Validity of a questionnaire to assess historical physical activity in older women. Med Sci Sports Exerc. 2004, 36: 2082-2087.View ArticlePubMedGoogle Scholar
- Falkner KL, Trevisan M, McCann SE: Reliability of recall of physical activity in the distant past. Am J Epidemiol. 1999, 150: 195-205. 10.1093/oxfordjournals.aje.a009980.View ArticlePubMedGoogle Scholar
- Evenson KR, Wilcox S, Pettinger M, Brunner R, King AC, McTiernan A: Vigorous leisure activity through women’s adult life: the Women’s health initiative observational cohort study. Am J Epidemiol. 2002, 156: 945-953. 10.1093/aje/kwf132.View ArticlePubMedGoogle Scholar
- Chasan-Taber L, Erickson JB, McBride JW, Nasca PC, Chasan-Taber S, Freedson PS: Reproducibility of a self-administered lifetime physical activity questionnaire among female college alumnae. Am J Epidemiol. 2002, 155: 282-289. 10.1093/aje/155.3.282.View ArticlePubMedGoogle Scholar
- Idler EL, Benyamini Y: Self-rated health and mortality: a review of twenty-seven community studies. J Health Soc Behav. 1997, 38: 21-37. 10.2307/2955359.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2458/13/180/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.