Skip to main content

Educational inequality in multimorbidity: causality and causal pathways. A mendelian randomisation study in UK Biobank



Multimorbidity, typically defined as having two or more long-term health conditions, is associated with reduced wellbeing and life expectancy. Understanding the determinants of multimorbidity, including whether they are causal, may help with the design and prioritisation of prevention interventions. This study seeks to assess the causality of education, BMI, smoking and alcohol as determinants of multimorbidity, and the degree to which BMI, smoking and alcohol mediate differences in multimorbidity by level of education.


Participants were 181,214 females and 155,677 males, mean ages 56.7 and 57.1 years respectively, from UK Biobank. We used a Mendelian randomization design; an approach that uses genetic variants as instrumental variables to interrogate causality.


The prevalence of multimorbidity was 55.1%. Mendelian randomization suggests that lower education, higher BMI and higher levels of smoking causally increase the risk of multimorbidity. For example, one standard deviation (equivalent to 5.1 years) increase in genetically-predicted years of education decreases the risk of multimorbidity by 9.0% (95% CI: 6.5 to 11.4%). A 5 kg/m2 increase in genetically-predicted BMI increases the risk of multimorbidity by 9.2% (95% CI: 8.1 to 10.3%) and a one SD higher lifetime smoking index increases the risk of multimorbidity by 6.8% (95% CI: 3.3 to 10.4%). Evidence for a causal effect of genetically-predicted alcohol consumption on multimorbidity was less strong; an increase of 5 units of alcohol per week increases the risk of multimorbidity by 1.3% (95% CI: 0.2 to 2.5%). The proportions of the association between education and multimorbidity explained by BMI and smoking are 20.4% and 17.6% respectively. Collectively, BMI and smoking account for 31.8% of the educational inequality in multimorbidity.


Education, BMI, smoking and alcohol consumption are intervenable causal risk factors for multimorbidity. Furthermore, BMI and lifetime smoking make a considerable contribution to the generation of educational inequalities in multimorbidity. Public health interventions that improve population-wide levels of these risk factors are likely to reduce multimorbidity and inequalities in its occurrence.

Peer Review reports


Multimorbidity, defined as patients living with two or more chronic health conditions, is associated with reduced quality of life and life expectancy [1]. The ageing population is driving an increase in the prevalence of multimorbidity, which already affects approximately one in four of the population in the UK and USA [2, 3]. Identifying the main reversible causes of multimorbidity could inform the design of preventative strategies, helping to improve quality of life for patients and reduce the economic impact of multimorbidity.

There are considerable inequalities in multimorbidity. People from more deprived backgrounds are more likely to be multimorbid, and more likely to develop multimorbidity at an earlier age. For example, a study covering one third of the Scottish population found that young and middle-aged adults over 30 years in the most deprived areas had comparable sex-specific rates of multimorbidity to those in the least deprived areas who were 10–15 years older [2]. Other risk factors have been postulated for multimorbidity, including alcohol, smoking and BMI [4]. Given the social patterning of these exposures, however, associations are likely to be highly confounded and establishing causality is challenging. In addition, the association of education and multimorbidity may be mediated by these risk factors [5].

Mendelian Randomisation (MR) uses genetic variants in non-experimental (observational) data to make causal inference. MR is an instrumental variable (IV) analysis implemented using genetic variants robustly associated with an exposure to estimate the causal effect of the exposure on an outcome less prone to confounding and reverse causation bias [6]. For an introduction to MR analysis see [7]. In brief, IV analyses use another variable to proxy the exposure of interest. This ‘instrument’ is chosen because it meets strict statistical criteria and the IV estimate of the exposure-outcome association is much less likely to be biased. More recently, the MR arena has undergone rapid development [8], with new methods available to assess causality for both mediation (the causal pathways linking an exposure to an outcome) [9] and effect modification (the study of whether one exposure alters the effect of another) [10]. Commonly, the genetic instrument used is a polygenic risk score (PRS) for the exposure derived by weighting each SNP by the regression coefficients from the discovery genome-wide association study.

In this paper, our aim is to use MR to evaluate the causal effects of BMI, smoking, alcohol intake and years of education on multimorbidity. Further, we evaluate the degree to which BMI, smoking, and alcohol consumption explain educational inequalities in multimorbidity, and we consider whether the risk factors interact with one another in their effects on multimorbidity. To our knowledge, this is the first study to date to interrogate the causality of observed determinants of multimorbidity, and to study the mechanisms linking education to multimorbidity using an approach that is robust to confounding and reverse causality.



UK Biobank is a population-based health research resource consisting of approximately 500,000 people, aged between 38 years and 73 years, who were recruited between the years 2006 and 2010 from across the UK [11]. Particularly focused on identifying determinants of human diseases in middle-aged and older individuals, participants provided a range of information (such as demographics, health status, lifestyle measures, cognitive testing, personality self-report, and physical and mental health measures) via questionnaires and interviews; anthropometric measures, blood pressure readings and samples of blood, urine and saliva were also taken (data available at A full description of the study design, participants and quality control (QC) methods have been described in detail previously [12]. UK Biobank received ethical approval from the Research Ethics Committee (REC reference for UK Biobank is 11/NW/0382).

Exposures were all assessed at the baseline research assessment. We followed a published approach [13] for inferring years of education from highest achieved qualification (see Supplement for further detail). Body Mass Index (BMI) in kg/m2 was calculated using height and weight measurements. We derived a lifetime smoking index, representing a continuous score of smoking behaviours and incorporating smoking initiation, duration, heaviness, and cessation, using a previously published approach [14, 15]. (This approach was used because lifetime smoking scores incorporate heaviness but are applicable to both smokers and non-smokers.) Following the approach used in a previous Genome-Wide Association Study (GWAS) [16], we derived estimated units of alcohol consumed per week. We used responses to the baseline touchscreen questionnaire on weekly red wine, white wine and champagne, beer and cider, fortified wine, spirit and other consumption to estimate the typical units of alcohol consumed per week. Former drinkers (those who previously drank alcohol but no longer do) were set to missing and excluded from analyses because treating them as non-drinkers would be inappropriate and data on their previous alcohol consumption was unavailable. Similarly, we excluded individuals with very high current alcohol consumption (> 200 units per week). Responders who indicated they were never-drinkers were set to 0 units per week.

Our primary outcome was the standard definition of multimorbidity, the presence of two or more chronic conditions. Three additional multimorbidity measures were used in secondary analyses; the presence of 3 + and 4 + conditions, and the Cambridge multimorbidity score (CMMS) with general-outcome weights [17]. This general CMMS is a continuous measure, with conditions weighted according to the average standardised weights from models of consultations, mortality and emergency admissions. For all measures of multimorbidity, the presence or absence of 35 health conditions were considered as per Payne et al. [17] (see Condition Definitions table, Supplement). Blindness/low vision and learning disability were excluded from the original condition list owing to the lack of appropriate self-reported variables. In contrast to the condition definitions applied by Payne et al [17]., which included temporal restrictions and use of medications, our definitions were simplified to self-reported ‘ever’ having had a condition with the exception of cancer (self-reported doctor diagnosed new cancer estimated to be within the last 5 years, excluding non-melanoma skin cancer), hearing loss, constipation and painful condition (see supplement for full details). The information was obtained via a touchscreen questionnaire which was followed by a nurse-led interview to clarify and categorise conditions correctly. We derived each measure of multimorbidity twice, including and excluding alcohol problems in the definition, because alcohol consumption was included as an exposure or mediator in certain models. (We used the multimorbidity outcomes excluding alcohol for all models that included alcohol as an exposure/mediator.) We used the CPRD @ Cambridge – code lists (GOLD) Version 1.1 (Cambridge, UK; University of Cambridge, 2018) as a point of reference when assigning variables to condition categories, available here: [downloaded May 2020].

Genetic data: Details of the in-house quality control filtering applied to the genetic data are provided in the supplement. Quality Control filtering of the UK Biobank data was conducted by R.Mitchell, G.Hemani, T.Dudding, L.Corbin, S.Harrison, L.Paternoster as described in the published protocol (doi: [18].

Statistical methods

Participants were included in our analysis if they had complete data (outcome, covariates, polygenic risk score and exposure) for at least one exposure, they were of white British ancestry (to avoid confounding by population stratification) and they passed genetic QC criteria (see Supplement: Quality Control of Genetic Data). Related individuals were included in the GWAS (where relatedness was accounted for) but excluded from subsequent regression analyses. For analyses including alcohol consumption, former drinkers were removed from analyses because we were unable to consider the timing of stopping alcohol consumption in relation to the development of multimorbidity. In addition, former drinkers may be violating their genetic trajectory (for example, they may be genetically predisposed to be heavier drinkers). All analyses were conducted using both standard regression models (with no instrumental variable), and using MR. The study design was cross-sectional cohort.

Multivariable regression analyses were used to assess the association between each exposure and each measure of multimorbidity (2 + conditions, 3 + conditions, 4 + conditions and the CMMS). Linear, rather than logistic, regression was used for all regression models so that the estimates were on the same scale as the MR estimates and represented risk differences. All multivariable regressions were run with robust standard errors and adjusted for age, sex, 40 genetic principal components and UKBB assessment centre.

Mendelian randomization (MR) uses genetic variants known to be robustly related to the exposure of interest as instrumental variables. MR analyses were run via two-stage least squares using ivreg2 [19] in Stata [20] with the “robust” option specified (to enforce robust standard errors). For all MR analyses, we used a ‘split sample’ approach to avoid sample overlap with published GWASs (which can bias estimates [21]) and to implement uniform methodology across exposures. This involved splitting the UK Biobank sample into two halves randomly. Within each half, we ran a GWAS (using BOLT-LMM [22] and the MRC IEU UK Biobank GWAS pipeline to identify genetic variants related to each of the four exposures, adjusting for age at baseline clinic, sex and 40 genetic principal components (to account for population structure). All SNPs with a p-value less than or equal to 5 × 10− 8 were used to derive a polygenic risk score (PRS) for each exposure in the alternative split weighted by the regression coefficients from the GWAS. Clumping was performed at an R2 threshold of 0.001 within a 10,000 kb window, and proxies were identified using the European sub-sample of the 1000 Genomes as a reference panel [23] and a lower R2 limit of 0.8. The PRSs were standardised by subtracting the mean and dividing by the standard deviation. The PRSs defined based on the GWAS from one half of the UKBB sample were applied in MR analyses of the other half of the UKBB sample. MR analyses were run using two-stage least squares [19] and were adjusted for age, sex, 40 genetic principal components and UKBB assessment centre. The beta coefficients and standard errors from MR analyses within each half of the sample were then meta-analysed to give one estimate for beta, a 95% confidence interval, and an I2 estimate as an indication of heterogeneity between the estimates in each split [24, 25]. Fixed-effects meta-analyses were performed using the metan command [26] in Stata. Analyses were scaled such that coefficients represented an SD change in education (equivalent to 5.1 years), a 5 kg/m2 increase in BMI, a 5 units per week increase in alcohol consumption, and an SD unit increase in lifetime smoking index. (As an example, a 1 SD increase in lifetime smoking is roughly the same as being a current smoker who has smoked 5 cigarettes per day for 12 years, rather than a never smoker.)

Sensitivity analyses [27] to test the assumption of no pleiotropy in MR analysis were run for the main outcome (at least two chronic conditions) (MR Egger [28], IVW [29], simple modal estimator [30] and unweighted median estimator [31]).

BMI, smoking, and alcohol consumption are all potential consequences of educational attainment, possibly lying on the causal pathway between education and multimorbidity and explaining part of the effect and thus are considered as potential mediators of the education-multimorbidity relationship. Mediation of the association between years of education and multimorbidity was assessed by including each potential mediator (BMI, smoking, and alcohol consumption) in turn as a covariate in a linear regression of multimorbidity on years of education. The joint contribution of the BMI and smoking mediators was assessed by including both variables as covariates. Similarly, in MR analyses, mediation was assessed by including both years of education and (a) each mediator in turn, and (b) both smoking and BMI mediators as exposures in a multivariable MR analysis [32, 33]. The coefficients for years of education from these regressions and multivariable MR models estimate the ‘direct effect’; i.e. the effect of years of education on multimorbidity that operates independently of the mediator(s) being considered. The ‘indirect effect’, i.e. the effect of years of education on multimorbidity that operates through the mediator(s) is estimated by subtracting the direct effect from the total effect (the coefficient for years of education from a regression on multimorbidity not accounting for any mediators). The proportion mediated is calculated as the indirect effect divided by the total effect, multiplied by 100 to express as a percentage. 95% confidence intervals of the indirect effects and proportions mediated were calculated using Stata’s -bootstrap- command and 200 repeats. MR analyses used the same ‘split sample’ approach as the main analysis. For mediation analyses, we restricted analyses to two definitions of multimorbidity – the main outcome variable of two or more chronic conditions, and the CMMS, which, as a continuous variable, offers greatest statistical power.

Additive interaction effects between each pairwise exposure combination were assessed in multivariable linear regressions by including the product term. For MR analyses, we used a previously published approach to assessing interactions [10]; for two exposures, A and B, the instruments used in the multivariable MR model are PRS exposure A, PRS exposure B, PRS exposure A x PRS exposure B, and PRS exposure A x PRS exposure A. The last instrument was included as this has been shown to be necessary in the presence of a causal effect of one exposure on the other [10]. Our assumptions regarding the causal ordering of the risk factors, and hence the instruments used, are provided in the Supplement. In both multivariable regression and MR models, interactive effects were only estimated for the outcomes of multimorbidity status (2 + conditions) and the CMMS.

As a sensitivity analysis we re-ran the main observational regressions with multimorbidity status defined as 2 + conditions using logistic regression (as opposed to linear regression). We used gformula [34] to estimate the proportion mediated and indirect effects which allows for a binary outcome.

Analyses were run using Stata version 16 [20] and R version 3.6.1 [35]. Stata packages used in this analysis include rsource [36], ivreg2 [19] and mrrobust [27]. R packages used include reshape [37], data.Table  [38], plyr [39], dplyr [40], R.utils [41] and devtools [42].

Testing the assumptions of MR

We performed sensitivity analyses to test the MR assumption of no pleiotropy. We did not examine the association of the PRSs with potential confounders for several reasons. Firstly, when the exposure is education, plausible confounders would be early-life and intergenerational factors, for which data are not available. When the exposure is BMI, smoking or alcohol consumption, the most plausible confounder is education. However, MR studies [43, 44] have shown effects of these risk factors on education, and thus testing for an absence of association of these PRSs with education is not a reasonable test for the assumptions of MR in this instance.


336,891 individuals (67% of original sample) were included in the analysis (after removal of withdrawals, those failing genetic QC/without genetic data, those without phenotype data and related individuals). In the final sample the mean age was 56.9 years (IQR 51–63 years), of whom 53.8% were female (Table 1). 55.1% of the participants had a history of at least 2 chronic conditions at baseline. 12.6% of individuals had at least 4 chronic conditions. The most common conditions overall were hearing loss (37%), anxiety & other neurotic, stress related & somatoform disorders OR depression (35%) and painful condition (29%) (Supplement page 3). The mean CMMS in the total sample was 0.7 (IQR 0.1–1.1).

Former drinkers (N = 11,461) were removed from analyses involving alcohol. In this subset, 73% had 2 + conditions, 27% had 4 + conditions; the mean CMMS was 1.1 (1 d.p.).

Associations of educational attainment, BMI, smoking, and alcohol consumption with multimorbidity (2 + conditions)

Both multivariable regression and MR suggest that lower years of education, higher BMI, and higher lifetime smoking index are all associated with increased risk of multimorbidity (Fig. 1). In MR analyses, a one SD higher level of education (equivalent to an additional 5.1 years), is associated with a reduction in risk of multimorbidity (2 + conditions) by 9% (risk difference (RD) = -0.090, 95% CI -0.114, -0.065), a 5 kg/m2 increase in BMI is associated with a 9.2% increased risk of multimorbidity (RD = 0.092, 95% CI = 0.081 ,0.103), and a one SD higher lifetime smoking index is associated with a 6.8% increased risk of multimorbidity (RD = 0.068, 95% CI = 0.033, 0.104). Although both multivariable regression and MR analyses also suggest that higher alcohol consumption is a risk factor for multimorbidity, the magnitude of the effect sizes were smaller than for the other exposures. In MR analyses, an increase of 5 units of alcohol per week increases the risk of multimorbidity (2 + conditions) by 1.3% (RD = 0.013, 95% CI=-0.002, 0.025). For all exposures, the estimates from MR analyses were more extreme than the estimates from multivariable regression, but the confidence intervals were wider for MR; e.g. the risk difference for multimorbidity for a 1 SD higher smoking index was 0.048 (95% CI 0.046 to 0.050) in multivariable regression, and 0.068 (95% CI 0.033 to 0.104) in MR. The R2 and F statistics from the unadjusted linear regression of the exposure on the PRS, in addition to the number of SNPs in the PRS, are presented in Supplementary Table 6 for each split.

Mechanisms explaining educational inequality in multimorbidity

In MR analyses, the proportions of the educational inequality in multimorbidity explained by BMI and smoking when each risk factor was considered separately were 20% and 18% respectively (Fig. 2). When considered together in MR analyses, the two risk factors explained 32% of the educational inequality in multimorbidity. This contrasts with multivariable regression analyses, where the proportions mediated were estimated to be 28% and 25% for BMI and smoking respectively, and 51% for both risk factors combined. Multivariable regression estimated the proportion of the educational inequality in multimorbidity explained by alcohol consumption to be 0.1% (Supplementary Table 3). We did not generate an overall MR estimate for the proportion mediated by alcohol consumption because of inconsistent mediation, i.e. direct effect greater than the total effect, in one of the dataset splits (Supplementary Table 3).

Interactions between risk factors for multimorbidity

Multivariable regression analyses to evaluate the interactive effect of pairwise combinations of the exposures on the risk of having at least 2 chronic conditions (Table 2) suggest that there are interactions between some of the risk factors, namely BMI*smoking, BMI*alcohol, smoking*alcohol, and smoking*education. However, the magnitude of all interaction effects was small. Analogous MR analyses of these interactive effects gave point estimates that were larger in magnitude than the estimates from multivariable regression, with the direction being consistent for 3/6 of the pairwise combinations, but the interactions were imprecisely estimated, with wide confidence intervals that crossed the null for all interaction effects.

Sensitivity analyses

Secondary analyses using alternative definitions of multimorbidity yielded a similar pattern of results for the associations of years of education, BMI, smoking, and alcohol consumption with multimorbidity (Supplementary Tables 1, 2) and for mediation of the educational inequality in the CMMS (Supplementary Table 3). Similar to the main outcome, MR confidence intervals for all interaction terms were wide when analyses were repeated with CMMS as the outcome and the direction of effect was consistent with multivariate regression for 2/6 pairwise combinations (Supplementary Table 4). In multivariate regression analyses, the direction of the interactive effect of smoking and education on the CMMS was in the opposite direction compared with the main outcome.

Sensitivity analyses to test the assumption in the main MR analysis (outcome of 2 + conditions) of no pleiotropy (Supplementary Table 5) revealed estimates that were generally directionally consistent. The MR-egger constant estimates suggest evidence for directional pleiotropy for BMI and smoking.

The sensitivity analyses re-running the main observational regressions using logistic regression and the mediation analysis using gformula [34] are presented in Supplementary Tables 7 and 8. The single exposure logistic regressions (Supplementary Table 7) revealed associations in the same direction as the linear regression analyses. The interaction analyses revealed that as with the main analysis, any interactions were of very small magnitude. The proportion of the education association with multimorbidity mediated by the other exposures when calculated by gformula which allows for a binary outcome was strikingly similar for all mediators examined to the mediation analyses using linear regression (Supplementary Table 8).

Table 1 Descriptive Characteristics of Study Participants
Fig. 1
figure 1

Multivariable regression (MVR) and Mendelian Randomization (MR) results for the causal effect (Risk Difference, RD) of each exposure on multimorbidity status (2 + chronic conditions)

Fig. 2
figure 2

Mediation of the educational inequality in multimorbidity (2 + chronic conditions) by BMI, lifetime smoking index, and BMI and lifetime smoking index combined. Analyses conducted using multivariable regression (MVR) and Mendelian randomization (MR). Estimate presented is the Proportion Mediated (PM)

Table 2 Interactions between risk factors for multimorbidity (2 + chronic conditions). Analyses conducted using multivariable regression (MVR) and Mendelian randomization (MR) to estimate additive interactions on the risk difference scale **


This study has provided evidence for a causal effect of lower educational attainment, higher BMI and higher level of smoking on multimorbidity status. There was also weak evidence for a causal effect of greater alcohol consumption on risk of multimorbidity, although the magnitude of effects was generally smaller than for the other risk factors. In our analyses, one standard deviation of years of education (equivalent to 5.1 additional years) equates approximately to a 9% decrease in risk of multimorbidity. For education, BMI, smoking, and alcohol consumption, estimated effects on multimorbidity were greater in MR analyses compared with multivariable regression. However, confidence intervals for MR results were wide and, with the exception of the coefficient for education, spanned the point estimate from multivariable regression models.

Our analysis suggests that 20% of educational inequality in multimorbidity is explained by BMI, and 32% is jointly explained by BMI and smoking. This is slightly less than the 51% of educational inequality in multimorbidity explained by BMI and smoking in multivariable regression. However, 48–88% of the total effect of education on multimorbidity remains unaccounted for by these risk factors. We did not include alcohol in conjunction with the other potential mediators because neither multivariable nor MR analyses provided evidence that alcohol consumption mediated the effect of education on multimorbidity. Units consumed per week is also a crude measure of alcohol consumption, which could partially explain the lack of mediation by alcohol use. Looking ahead we need to consider other explanatory mechanisms, which are likely to be numerous, complex and span multiple social, behavioural and biological domains.

While there may be interactions between various lifestyle and anthropometric exposures on risk of multimorbidity, we could not provide evidence for these within a causal framework possibly due to low power to detect interactive effects. In multivariable analyses, where statistical power is greater than MR, interactions were generally of small magnitude, and were most often in the opposite direction to the main effects of the risk factors (i.e. the cumulative effect of having both risk factors was generally less than would be predicted from their individual effects), suggesting that interactions between the risk factors we studied are not a major contributor to the aetiology of multimorbidity.

A recent study [45] of over 400,000 GP-registered adults in England concluded that over half of health service utilisation is attributable to individuals with multimorbidity. Furthermore, the ageing population is leading to an increase in the prevalence of multimorbidity over time. Identifying the preventable causal determinants of multimorbidity is thus paramount for easing the pressure on health services. Our analysis suggests that population-level interventions to reduce BMI and smoking would likely lead to both a reduction in the occurrence of multimorbidity, and a reduction in educational inequalities in multimorbidity.

A key strength of our study is the use of Mendelian randomization to improve causal inference. In traditional epidemiological study designs, confounding factors and reverse causation can bias the estimated associations between putative risk factors for multimorbidity. Furthermore, when analysing the mediating pathways explaining educational inequalities in multimorbidity, measurement error in the mediator can lead to an underestimate of the contribution of mediating variables [46]. The use of MR overcomes these limitations of previous analyses.

There is a body of work devoted to defining multimorbidity [2, 17, 45]. We explored three definitions of multimorbidity increasing in severity from 2 + to 4 + chronic conditions, in addition to a multimorbidity score, which captures all available information as a continuum with conditions weighted by the average standardised weights from models of consultations, mortality and emergency admissions. Findings were generally consistent across these definitions. Nonetheless, our findings may be driven by the prevalence of conditions feeding into the definition of multimorbidity. Multimorbidity is not a single ‘entity’; for different patients the state of multimorbidity can represent diverse combinations of health conditions. The most commonly reported health conditions in this study were hearing loss (37%), anxiety & other neurotic, stress related & somatoform disorders OR depression (35%) and painful condition (29%). Thus, our findings may represent established causal effects of education, BMI [47], smoking, and alcohol on these conditions (e.g. associations of obesity [48] and smoking behaviour [49] with hearing loss have been previously reported). Nonetheless, as these conditions underlie many cases of multimorbidity, this does not detract from the implications of our findings about the potential public health and clinical impact of interventions to improve population levels of these risk factors. Further work identifying distinct clusters of health conditions to explore whether different ‘types’ of multimorbidity have distinct aetiologies may be of interest.

Our study has several limitations. Firstly, the use of self-reported health conditions may have led to misclassification of multimorbidity status for some people. However, self-reported data (unlike linked primary and secondary care data) was available across the whole sample. Secondly, UK Biobank participants are known to be over-selected from higher socio-economic categories [50], and the use of genetic data in this analysis necessitated restricting to people of white British ethnicity. This may have led to underestimation of the effects of exposures on multimorbidity, such that the effects we demonstrate can be viewed as minimal likely causal effects in a population more representative of Great Britain as a whole. Thirdly, with the exception of cancer, hearing loss, painful condition and constipation, we defined chronic conditions based on self-reported ever having been diagnosed by a doctor. In contrast, some studies [45] base their definition on long-term “currently active” conditions, making our definition less specific. However, our definition ensures that we can be as inclusive as possible with regards to the conditions contributing to multimorbidity. A further disadvantage of using condition history, rather than active conditions, in the definition of multimorbidity is that the multivariable regression analyses could be subject to reverse causation bias. The MR analyses, however, should not be as the exposures and mediators here are the “lifetime average”. In addition, because the multimorbidity outcome is defined as past or current illness, there is no temporal ordering of the exposure, mediator and outcome in the observational analyses. Again, the MR analysis overcomes this. Fourthly, we excluded former drinkers from analyses of alcohol because these individuals are known to have worse health outcomes than never drinkers and analysing them as non-drinkers would be inappropriate. There is no suitable option for how to address the former drinker group; the available data in UK Biobank does not permit detailed analysis of prior drinking patterns or time-since stopping alcohol consumption. However, this means that our conclusions about the effects of alcohol on multimorbidity may not extend to former drinkers. In addition, removal of former drinkers reduced the sample size for these analyses and hence the power to detect effects. Although we found weak evidence of an effect of alcohol on multimorbidity, the effect size was smaller than for the other risk factors. This may at least partially reflect the complexity of defining alcohol use. Here we used a continuous measure of units per week, but other measures such as binge drinking may also be relevant for disease outcomes, particularly in the context of educational attainment [51]. Our analysis of current alcohol units consumed per week also does not capture previously heavy but now light drinkers. An additional study limitation is that our analyses assume linear effects of the risk factors on multimorbidity; this assumption may not hold for all relationships. In particular, there is evidence that BMI has a non-linear relationship with mortality, albeit only in smokers [52]. Although methods to investigate non-linear relationships are available, statistical power would be insufficient to combine these methods with the multivariable MR approach used in this paper. Importantly, although we checked where possible that our analysis met the assumptions [8] of MR, our conclusions rely on the validity of these assumptions. Lastly, although multivariable regression analyses demonstrated some interactions between risk factors, these were not detected in MR analyses. This likely reflects insufficient power to examine interactive effects within a causal framework.

The results of this study suggest that education, BMI, smoking, and, to a lesser degree, alcohol consumption, all have causal effects on multimorbidity. Furthermore, BMI and smoking explain approximately one third of the educational inequality in multimorbidity. In the UK, school attendance is compulsory until age 18, and policies to increase educational attendance would therefore focus on increasing university participation. Such policies may potentially influence multimorbidity risk. However, policies to mitigate the health disadvantage of low education may be more realistic and within reach of public health, thus motivating our study of the pathways explaining educational inequality in multimorbidity. Interventions to reduce population levels of BMI and smoking could lead to reduced occurrence and reduced educational inequalities in multimorbidity.

Data availability

UK Biobank data access procedures are governed by UK Biobank,

The code for the analyses can be found here:


  1. National Guideline Centre. Multimorbidity: clinical assessment and management. London: National Institute for Health and Care Excellence; 2016.

    Google Scholar 

  2. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. The Lancet. 2012;380(9836):37–43.

    Article  Google Scholar 

  3. US Department of Health and Human Services. Multiple chronic conditions - a strategic framework: optimum health and quality of life for individuals with multiple chronic conditions. Washington, DC; 2010.

  4. Katikireddi SV, Skivington K, Leyland AH, Hunt K, Mercer SW. The contribution of risk factors to socioeconomic inequalities in multimorbidity across the lifecourse: a longitudinal analysis of the Twenty-07 cohort. BMC Med. 2017;15(1):152.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Mutz J, Roscoe CJ, Lewis CM. Exploring health in the UK Biobank: associations with sociodemographic characteristics, psychosocial factors, lifestyle and environmental exposures. BMC Med. 2021;19(1):240.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27(8):1133–63.

    Article  PubMed  Google Scholar 

  7. Davies NM, Holmes MV, Davey Smith G. Reading mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Zheng J, Baird D, Borges M-C, Bowden J, Hemani G, Haycock P, et al. Recent developments in mendelian randomization studies. Curr Epidemiol Rep. 2017;4(4):330–45.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Burgess S, Thompson SG. Multivariable mendelian randomization: the Use of Pleiotropic Genetic Variants to Estimate Causal Effects. Am J Epidemiol. 2015;181(4):251–60.

    Article  PubMed  PubMed Central  Google Scholar 

  10. North T-L, Davies NM, Harrison S, Carter AR, Hemani G, Sanderson E, et al. Using Genetic Instruments to Estimate interactions in mendelian randomization studies. Epidemiology. 2019;30(6):e33–e5.

    Article  PubMed  Google Scholar 

  11. Allen NE, Sudlow C, Peakman T, Collins R. UK Biobank Data: come and get it. Sci Transl Med. 2014;6(224):224ed4.

    Article  PubMed  Google Scholar 

  12. Collins R. What makes UK Biobank special? The Lancet. 2012;379(9822):1173–4.

    Article  Google Scholar 

  13. Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Leffondré K, Abrahamowicz M, Xiao Y, Siemiatycki J. Modelling smoking history using a comprehensive smoking index: application to lung cancer. Stat Med. 2006;25(24):4132–46.

    Article  PubMed  Google Scholar 

  15. Wootton RE, Richmond RC, Stuijfzand BG, Lawn RB, Sallis HM, Taylor GMJ, et al. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a mendelian randomisation study. Psychol Med. 2020;50(14):2435–43.

    Article  PubMed  Google Scholar 

  16. Clarke TK, Adams MJ, Davies G, Howard DM, Hall LS, Padmanabhan S, et al. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N = 112 117). Mol Psychiatry. 2017;22(10):1376–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Payne RA, Mendonca SC, Elliott MN, Saunders CL, Edwards DA, Marshall M, et al. Development and validation of the Cambridge Multimorbidity score. Can Med Assoc J. 2020;192(5):E107.

    Article  Google Scholar 

  18. Mitchell R, Hemani G, Dudding T, Corbin L, Harrison S, Paternoster L. UK Biobank Genetic Data: MRC-IEU Quality Control, version 2 2019

  19. Baum CF, Schaffer ME, Stillman S. ivreg2: Stata module for extended instrumental variables/2SLS, GMM and AC/HAC, LIML and k-class regression 2010.

  20. StataCorp. Stata Statistical Software: release 16. College Station. TX: StataCorp LLC; 2019.

    Google Scholar 

  21. Hartwig FP, Davies NM, Hemani G, Davey Smith G. Two-sample mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol. 2017;45(6):1717–26.

    Article  PubMed Central  Google Scholar 

  22. Loh P-R, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nat Genet. 2015;47(3):284–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. McVean GA, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65.

    Article  CAS  PubMed  Google Scholar 

  24. Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Higgins JP, Thompson SG. Controlling the risk of spurious findings from meta-regression. Stat Med. 2004;23(11):1663–82.

    Article  PubMed  Google Scholar 

  26. Harris RJ, Bradburn MJ, Deeks JJ, Harbord RM, Altman DG, Sterne JAC. Metan: fixed- and random-effects meta-analysis. Stata J. 2008;8(1):3–28.

    Article  Google Scholar 

  27. Spiller W, Davies NM, Palmer TM. Software application profile: mrrobust—a tool for performing two-sample summary mendelian randomization analyses. Int J Epidemiol. 2019;48(3):684–90.

    Article  Google Scholar 

  28. Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44(2):512–25.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using Summarized Data. Genet Epidemiol. 2013;37(7):658–65.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46(6):1985–98.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in mendelian randomization with some Invalid Instruments using a weighted median estimator. Genet Epidemiol. 2016;40.

  32. Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable mendelian randomization in the single sample and two-sample summary data settings. bioRxiv. 2018.

  33. Carter AR, Sanderson E, Hammerton G, Richmond RC, Davey Smith G, Heron J, et al. Mendelian randomisation for mediation analysis: current methods and challenges for implementation. Eur J Epidemiol. 2021;36(5):465–78.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Daniel R. GFORMULA. Stata module to implement the g-computation formula for estimating causal effects in the presence of time-varying confounding or mediation. Statistical Software Components S457204. Revised 29 September 2021 ed. Boston College Department of Economics; 2010.

  35. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2019.

    Google Scholar 

  36. Newson R. RSOURCE. Stata Module to run R from inside Stata using an R source file. revised 09 May 2016 ed. Boston College Department of Economics Statistical Software Components S456847; 2007.

  37. Wickham H. Reshaping data with the reshape package. J Stat Softw. 2007;21(12).

  38. Dowle M, Srinivasan A. data.table: Extension of data.frame. 2021.

  39. Wickham H. The Split-Apply-combine strategy for Data Analysis. J Stat Softw. 2011;40(1):1–29.

    Article  Google Scholar 

  40. Wickham H, Francois R, Henry L, Muller K. dplyr: A Grammar of Data Manipulation R Package Version 1.0.4. 2021.

  41. Bengtsson H. Rutils. Various Programming Utilities R Package Version 2.10.1. 2020.

  42. Wickham H, Hester J, Chang W. devtools: Tools to Make Developing R Packages Easier R Package Version 2.2.0. 2019.

  43. Howe LD, Kanayalal R, Harrison S, Beaumont RN, Davies AR, Frayling TM, et al. Effects of body mass index on relationship status, social contact and socio-economic position: mendelian randomization and within-sibling study in UK Biobank. Int J Epidemiol. 2020;49(4):1173–84.

    Article  PubMed  Google Scholar 

  44. Harrison S, Davies AR, Dickson M, Tyrrell J, Green MJ, Katikireddi SV, et al. The causal effects of health conditions and risk factors on social and socioeconomic outcomes: mendelian randomization in UK Biobank. Int J Epidemiol. 2020;49(5):1661–81.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Cassell A, Edwards D, Harshfield A, Rhodes K, Brimicombe J, Payne R, et al. The epidemiology of multimorbidity in primary care: a retrospective cohort study. Br J Gen Pract. 2018;68(669):e245–e51.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Blakely T, McKenzie S, Carter K. Misclassification of the mediator matters when estimating indirect effects. J Epidemiol Commun Health. 2013;67(5):458.

    Article  Google Scholar 

  47. Tyrrell J, Mulugeta A, Wood AR, Zhou A, Beaumont RN, Tuke MA, et al. Using genetics to understand the causal influence of higher BMI on depression. Int J Epidemiol. 2018;48(3):834–48.

    Article  PubMed Central  Google Scholar 

  48. Li W, Peng Y, Chen D, Lu Z, Tao Y. Association of weight change across adulthood with hearing loss: a retrospective cohort study. Int J Obes. 2022;46(10):1825–32.

    Article  Google Scholar 

  49. Garcia Morales EE, Ting J, Gross AL, Betz JF, Jiang K, Du S, et al. Association of cigarette smoking patterns over 30 years with audiometric hearing impairment and Speech-in-noise perception: the atherosclerosis risk in Communities Study. JAMA Otolaryngology–Head & Neck Surgery. 2022;148(3):243–51.

    Article  Google Scholar 

  50. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of Sociodemographic and Health-Related characteristics of UK Biobank participants with those of the General Population. Am J Epidemiol. 2017;186(9):1026–34.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Beard E, Brown J, West R, Kaner E, Meier P, Michie S. Associations between socio-economic factors and alcohol consumption: a population survey of adults in England. PLoS ONE. 2019;14(2):e0209442.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Sun Y-Q, Burgess S, Staley JR, Wood AM, Bell S, Kaptoge SK, et al. Body mass index and all cause mortality in HUNT and UK Biobank studies: linear and non-linear mendelian randomisation analyses. BMJ. 2019;364:l1042.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


This research has been conducted using the UK Biobank Resource under Application Number 19278.

This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol –

This study used the MRC IEU UK Biobank GWAS pipeline. Please see: Elsworth, BL, Mitchell, R, Raistrick, CA, Paternoster, L, Hemani, G, Gaunt, TR (2019): MRC IEU UK Biobank GWAS pipeline version 2.

We would like to thank Dr Gemma Hammerton for her help using gformula.


The funders had no role in the design of the study or the decision to publish.

LDH and TLN are supported by a Career Development Award from the UK Medical Research Council, to LDH (MR/M020894/1). ARC is supported by the MRC Integrative Epidemiology Unit (MC_UU_00011/6) and the University of Bristol British Heart Foundation Accelerator Award (AA/18/7/34219).

CS is partially supported by NIHR ARC West and is an NIHR Senior Investigator. The views expressed in this article are those of the author(s) and not necessarily those of the NIHR, or the Department of Health and Social Care. 

DCB is supported by Wellcome and is a PhD student. REW is funded by the Norwegian South Eastern Regional Health Authority (2020024). 

Author information

Authors and Affiliations



The study was conceived and designed by TLN, LDH, SH, RAP, and CS. Statistical analysis was carried out by TLN. TLN and LDH wrote the paper. All authors contributed to interpretation of the results and critical revisions of the manuscript.

Corresponding author

Correspondence to Teri-Louise North.

Ethics declarations

Ethics approval and consent to participate

UK Biobank received ethical approval from the Research Ethics Committee (REC reference for UK Biobank is 11/NW/0382).

Consent for publication

Not applicable.

Competing interests

TGR is an employee of GlaxoSmithKline outside of this work. AC is an employee of Novo Nordisk outside of this work. LH received a Career Development Award from the Medical Research Council for the submitted work, which also supported TLN; ARC received funding from the University of Bristol Medical Research Council Integrative Epidemiology Unit [MC_UU_00011/1 and MC_UU_00011/6]; NIHR Applied Research Collaboration West provides funding towards CS’s salary; National Institute for Health Research provides funding towards CS’s research expenses; in the past 36 months REW received a grant from the South-Eastern Norway Regional Health Authority [2020024], REW worked in a unit funded by the Medical Research Council [MC_UU_00011/3 and MC_UU_00011/7], REW had a previous postdoc funded by the Wellcome Trust [204895/Z/16/Z], RP received an institution-paid grant from the Medical Research Council, RP received an institution-paid grant from the National Institute for Health and Care Research; in the past 36 months REW wrote a report on literature relating smoking and mental health for the public charity ‘Action on Smoking and Health’; in the past 36 months REW received support for attending meetings and/or travel from (1) the Society for research on nicotine and tobacco New Investigator Award, (2) the Gro Harlem Brundtland Visiting Scholarship at the Centre for fertility and health Norwegian institute of public health, (3) an International Convention of psychological science travel grant; within the past 36 months RP has been the Chair of the Society of Academic Primary Care and has been a member (payment to institution) of the MHRA Pharmacovigilance Expert Advisory Group; within the past 36 months RP has had a personal paid role as Consultant Editor for the journal Prescriber; within the past 36 months ARC received an honoraria from the American Medical Association Memphis Chapter for delivering a Mendelian randomization workshop.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

North, TL., Harrison, S., Bishop, D.C. et al. Educational inequality in multimorbidity: causality and causal pathways. A mendelian randomisation study in UK Biobank. BMC Public Health 23, 1644 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: