Study population
The HRS is a nationally representative and longitudinal study that has biennially assessed the financial, physical, and mental well-being of community-dwelling adults at least 50 years of age and their spouses since 1992. Since the conception of the HRS, new participants have been added to the survey. The HRS is sponsored by the National Institute on Aging (NIA U01AG009740) and is conducted by the University of Michigan [10].
From 2006 to 2012, the HRS collected genetic data from a sub-sample of non-Hispanic white and non-Hispanic black respondents who consented and provided salivary deoxyribonucleic acid (DNA). Details on the sample selection and consent procedures are available elsewhere [11]. We restricted our analysis to the non-Hispanic white and non-Hispanic black respondents with available genetic information, and followed these respondents from 1992 to 2014. We linked the HRS data files compiled by RAND Corporation [12] with the HRS genetic data containing a PGS for diabetes [11]. Descriptions of the assay and calculation procedures are detailed elsewhere [11].
Measures
Outcome
Incident diabetes was determined by a respondent’s affirmative response to the question: “Since we last talked to you, that is since [last interview date], has a doctor ever told you that you have diabetes or high blood sugar?” Our outcome was the age at which individuals first reported a diabetes diagnosis. Age was censored for individuals who did not report diabetes by the last wave in 2014 or who died without ever reporting diabetes.
Exposure
GWASs have identified a large number of genetic variants, typically SNPs, associated with a wide range of health outcomes and behaviors. However, the majority of these variants have a small effect and typically correspond to a small fraction of truly associated variants, meaning that they have limited predictive power. A PGS aggregates and weights this information into a single measure linked to a phenotype of interest [13]. Genotypes in the HRS were assessed using the llumina HumanOmni2.5 BeadChips (HumanOmni2.5-4v1, HumanOmni2.5-8v1, HumanOmni2.5-8v1.1; Illumina, Inc., San Diego, CA, USA), which assessed more than 1.9 million SNPs after applying standard quality control procedures [14].
The diabetes PGS used in this analysis was constructed by HRS researchers based on a meta-analysis of GWASs for diabetes conducted by Morris and colleagues, which considered a large number of SNPs, more than 700,000 of which overlapped with the HRS sample; ultimately ten of these were found to be significant and used to construct the diabetes PGS [15]. SNP effect sizes were estimated among samples of primarily European ancestry using a stage one (discovery) sample of 12,171 cases of diabetes and 56,862 controls and a stage two (replication) sample of 22,669 cases and 58,119 controls [15].
The GWASs in the meta-analysis used to estimate SNP weights were derived from analyses based on European ancestry groups; in other words, the SNP weights that were developed from the European GWAS were applied to the African ancestry PGS, which may affect the predictive power and interpretation of the diabetes PGS for the sample of non-Hispanic blacks [13, 16, 17]. The PGSs were standardized by the HRS for each ethnicity to a standard normal curve (mean = 0, standard deviation [SD] = 1) [11]. This PGS z-score allowed for a simple interpretation—a one SD increase in the PGS versus the change of one risk allele within a race group. In our primary analysis, PGS was included as a continuous standardized score. In other words, a higher PGS score reflected higher genetic susceptibility to diabetes. We also performed sensitivity analyses with the PGS as a dichotomous variable (z-score < 0, z-score ≥ 0) and as a categorical variable splitting the PGS into tertiles.
Covariates
We selected covariates based on their anticipated association with diabetes. Sociodemographic covariates included sex (male, female), race (non-Hispanic white, non-Hispanic black), foreign born (yes, no), level of education (less than high school, high school/GED, some college, college or above), and partnership status (married/partnered, not married/partnered). Measures of economic well-being included employment status (employed, unemployed, retired, disabled, not in labor force), household income (log-transformed),Footnote 1 household wealth (log-transformed),Footnote 2 and whether the respondent had Medicare (yes, no), Medicaid (yes, no) or another form of health insurance (yes, no). We assessed behavioral and lifestyle characteristics by including respondent’s self-report of BMI (continuous), exercise (waves 1-6: report of vigorous activity at least three times per week; waves 7-12: report of vigorous activity more than once per week), smoking status (never smoker, current smoker, former smoker), and alcohol consumption (report of consuming 3+ alcoholic drinks on days they drank). Extreme values of BMI (BMI < 10, BMI > 75), were recoded as missing values. We also included self-reported binary indicators of whether the respondent had been diagnosed between waves with high blood pressure, cardiovascular disease, and arthritis, which are important health comorbidities for diabetes [14, 18, 19]. For the purpose of this analysis, our main interest was in the behavioral and lifestyle variables, and how they were modified with the inclusion of our exposure. By adjusting for all these sociodemographic covariates, measures of economic well-being, and health comorbidities, we attained better estimates of our behavioral and lifestyle variables.
Additionally, we adjusted for birth cohort to account for the structured sampling design of the HRS which introduces new birth cohorts approximately every six years. We also included ancestry-specific principal components to account for possible confounding from population stratification and possible ancestry differences in genetic makeup that could bias estimates, as recommended in the literature [11, 16]. See Ware et al. [11] for detailed information on the construction of the ancestry-specific principal components. Their estimates are not displayed in our tables for brevity.
Statistical analysis
Our analytic sample consisted of 15,190 respondents, of which 12,090 were non-Hispanic white and 3100 were non-Hispanic black. Over the course of the study period, this resulted in 103,059 person-years of follow-up.
Kaplan-Meir survival curves and multivariate Cox regression models [20] were used to estimate the contribution of the diabetes PGS to diabetes onset after adjusting for time-varying measures of behavioral and lifestyle characteristics. First, models were run as a function of all covariates except for the diabetes PGS and ancestry-specific principal components, both on the analytic sample and stratified by race to account for ancestral differences between non-Hispanic whites and non-Hispanic blacks [17, 21]. Most GWASs, including the one conducted by Morris and colleagues [15], are done predominantly on observations of European descent, so the predictive ability of the PGS might differ by race. These models were then run with the addition of the genetic variables as independent variables, again, both on the analytic sample and stratified by race. This second set of models demonstrated how the relationships changed with the inclusion of the genetic components. Concordance values (i.e., the proportion of pairs of cases in which the subject with higher risk had the event before the subject with lower risk) were used as goodness of fit measures [22]. Analyses of deviance, using log likelihoods, were run between corresponding models in the first and second sets [23]. Because of the nested nature of these models, these analyses were able to determine how the inclusion of the genetic component altered model fit.
In all our survival models, we included cluster-robust standard errors to account for household stratification in the HRS and to address potential within-household spillover effects [24]. We used age as the time unit in all analyses with an individual’s age at study entry as the baseline measure. All statistical analyses were performed in R version 3.5.0 [25] with the “survival” package for our primary analyses [26]. In all cases, significance was reported at the five-percent level.