Skip to main content

A foresight whole systems obesity classification for the English UK biobank cohort

A Correction to this article was published on 11 April 2022

This article has been updated



The number of people living with obesity or who are overweight presents a global challenge, and the development of effective interventions is hampered by a lack of research which takes a joined up, whole system, approach that considers multiple elements of the complex obesity system together. We need to better understand the collective characteristics and behaviours of those who are overweight or have obesity and how these differ from those who maintain a healthy weight.


Using the UK Biobank cohort we develop an obesity classification system using k-means clustering. Variable selection from the UK Biobank cohort is informed by the Foresight obesity system map across key domains (Societal Influences, Individual Psychology, Individual Physiology, Individual Physical Activity, Physical Activity Environment).


Our classification identifies eight groups of people, similar in respect to their exposure to known drivers of obesity: ‘Younger, urban hard-pressed’, ‘Comfortable, fit families’, ‘Healthy, active and retirees’, ‘Content, rural and retirees’, ‘Comfortable professionals’, ‘Stressed and not in work’, ‘Deprived with less healthy lifestyles’ and ‘Active manual workers’. Pen portraits are developed to describe the characteristics of these different groups. Multinomial logistic regression is used to demonstrate that the classification can effectively detect groups of individuals more likely to be living with overweight or obesity. The group identified as ‘Comfortable, fit families’ are observed to have a higher proportion of healthy weight, while three groups have increased relative risk of being overweight or having obesity: ‘Active manual workers’, ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’.


This paper presents the first study of UK Biobank participants to adopt this obesity system approach to characterising participants. It provides an innovative new approach to better understand the complex drivers of obesity which has the potential to produce meaningful tools for policy makers to better target interventions across the whole system to reduce overweight and obesity.

Peer Review reports


Obesity presents a global challenge for society, with 650 million people, (13% of the total adult population), estimated as being obese worldwide [1, 2] with an additional 39% of adults being classed as overweight. This complex problem that involves a multitude of conflicting stakeholders, lifestyle choices, and physiological factors is not limited to adults, with there being over 340 million (18%) children and adolescents (aged 5–19 years) who are overweight or have obesity globally in 2016 [1]. Overweight and obesity prevalence continues to increase, and with it related comorbidities, in spite of the fact that it is preventable. In recent years there has been significant investment by United Kingdom (UK) research funders [3, 4] to better understand, and subsequently prevent, weight related health problems, however, overweight and obesity still prevail.

Research to date concludes that the drivers of obesity are complex and multifaceted, and not as simple as consuming less food and drink or moving more [5, 6]. Biological, social/cultural, ecological and psychological factors combine to create a tangled web of obesity promoting behaviours and environments [7]. In turn interventions and policy decisions to prevent overweight and obesity are complex. Success will inevitably be limited through tackling individual elements, such as proximity to fast food outlets [8] or green spaces [9], in isolation. Interventions leading to the greatest overweight and obesity prevention are hypothesised to come from a whole systems approach at a macro level – tackling multiple components of the obesity system at a large geographic scale [5, 10]. In the UK, ranked the 6th most obese OECD country [11], the obesity system was comprehensively mapped in 2007 as part of the UK government’s Foresight initiative [12, 13], yet attempts to utilise this, in its entirety, in obesity prevention have been limited [5, 14,15,16]. In a recent systematic review just thirty qualitative studies were identified as having taken a whole systems approach to obesity or other complex public health challenges [16]. More research took a quantitative approach, with 44 studies identified. These studies followed a range of study designs [17,18,19], but in the most part did not report their methods clearly, which given the complexity associated with a whole systems approach, makes interpreting the findings difficult. Linked data on all elements of the whole system are not readily available at an individual level, nor at the population level, further adding to the challenge and complexity associated with taking a whole systems approach to obesity (or other complex public health challenges) [15, 16, 18]. A comprehensive data mapping exercise against the Foresight obesity system map was completed in 2018, concluding that more can be done using traditional and novel data sources to incorporate more aspects of the obesity system into ongoing research [15].

We believe that a classification developed from the robust Foresight obesity system map framework, applied to a cohort with a range of weight statuses is novel and will demonstrate an approach to whole systems obesity research using existing cohort data. This will generate substantial transferrable utility of the methods reported here to other settings where it is valuable to predict obesity risk from wide ranging social data.

The aims of this study are to (i) investigate the feasibility of using the Foresight map as a framework for data driven obesity research and policy making, (ii) develop an obesity classification system where variable selection is informed by the Foresight system obesity map, applied to a sizeable sub-sample of the UK Biobank cohort of 500,000 adults, and (iii) test this against overweight and obesity outcomes. We hypothesise that distinct classes/clusters will be generated that differentiate different weight statuses. This method can then be applied to other cohorts and populations where identifying individuals at risk of, or already living with overweight or obesity is important to support prevention and healthier lifestyle behaviours.


The UK Biobank is a large prospective cohort study of the 40 to 70 year old population, with baseline data collected between 2006 and 2010 in England, Scotland and Wales [20, 21]. The data were collected during an initial assessment at 22 regional centres, and provides information on the participants’ socio-demographic and economic situation, various physiological measurements, cognitive abilities and a limited number of biomarkers. Participants were asked to consent to the linking of their health episode and death data. Subsequently, sub-samples of the participants were invited back for repeat-assessments, for example to contribute to an imaging study or to measure physical activity levels using accelerometers. The cohort employed a range of data collection methods, including self-reported dietary surveys and Body Mass Index (BMI) generated from height and weight measured at the initial assessment visit.

In order to produce a classification of participants that aligns to known drivers for obesity, we select UK Biobank variables [22] that map onto the Foresight obesity systems map [5, 23]. This involved two ‘sifting’ exercises. The first sift was undertaken by three researchers (a population geographer, a nutritionist and a statistician) and involved the independent consideration of every UK Biobank variable then available and using the UK Biobank ‘Data Showcase’ to evaluate its utility when mapped to the Foresight obesity map. These three considerations were then discussed and a candidate sub-set of UK Biobank variables were requested. Having obtained these variables from UK Biobank a second sift was conducted by the same team. Access to the variables at the participant level allowed a more nuanced consideration of distributions and cross tabulations to further evaluate the utility of a sub-set of variables. Also at this point it became clear that many of the candidate variables had only partial coverage of UK Biobank participants (e.g. measurement from wearable accelerometers (n = 103,695) and food intake diaries (n = 70,714)). If such variables were to be included, the classification sample would be much reduced. This process is summarised in Fig. 1.

Fig. 1
figure 1

Modified PRISMA flow diagram for variable selection into Foresight informed k-means obesity classification of the UK Biobank cohort

Genetic Risk Score [24]

For analysis k-means classification is used to derive the obesity classification [25]. Classification methods attempt to group together participants that are most similar on a number of variables that describe their characteristics or nature, so that participants in each class will be more similar to their fellow class members than those in other classes.

In this study this classification was carried out using the k-means algorithm. K-means is a flexible and efficient algorithm, capable of handling the large volume of observations present in our data set. Other techniques are available e.g. hierarchical clustering or mixture models such as Latent Class/Profile Analysis [26] but these can be computationally heavy and require large memory storage when working with large datasets.

For k-means to be successfully applied it is necessary that the variables are not skewed, so that symmetric classes can form, and that each variable used is measured on a similar scale, so that each contributes a similar weight to the classification process, achieved here by converting each variable to a z-score. To correct for skewness Tukey’s ladder of power approach is used [27]. Prior to classification, the correlation between variables is measured to assess whether there is any potential redundancy in the variables where two or more variables are essentially capturing the same dimension.

To determine the number of classes, scree plots of the within classes sum of squares are custom and practice with k-means and additionally the reductions in this value from 1 to 12 classes are used. Another metric is that the resultant class sizes should be similar in nature, with no classes containing a particular high or small proportion of participants. For k-means analysis there can be no incomplete cases, therefore where information is either not known or not supplied by the participant, they are not used. Given that k-means clustering can only use continuous or integer count variables, we selected these for inclusion in the classification, and in addition the categorical variables with a match to the obesity systems map are used to profile the new classes.

To gain an understanding of the nature of each class, the centre of each class (which is essentially the mean value of each classification variable (which are given in Table 1) for the observations that are part of the class) is calculated. This helps to derive ‘pen portraits’ for each class – some classes will contain participants that, in aggregate, have high (or low) values for certain variables, and these variables provide the narrative for the pen portrait. Also, how the classes profile against other variables that have not been used to drive the classification are insightful. These additional variables come from UK Biobank and are also variables that are highlighted within the Foresight obesity map. The plausibility of these classes and their pen-portraits can help to validate the classification outcome.

Table 1 Variables used for classification

Pen portrait names were assigned to the classes following a workshop with 35 multidisciplinary academics in attendance that aimed to generate meaningful names that were considered to be non-stigmatising.

In the results presented below, descriptive statistics are tabulated and multinomial regression models are estimated using cluster–robust standard errors, clustered by the assessment centre visited, to test whether the new classification can predict weight status. The utility of the multinomial regression is to test the ability of our classification to capture, in a statistically significant sense, whether membership of each of our classes is linked to a participant obesity status. In the regression, the relative risk ratios indicate risk of being overweight or obese compared to a healthy weight (with underweight omitted, n = 1699) and compared to the reference class ‘Younger, urban, hard-pressed’. An adjusted model is also estimated that additionally adjusts for gender, ethnicity, health, qualifications and employment status since these are variables that map onto the Foresight obesity map but due to their character are not included in the classification. To account for multiple testing, we adopted an alpha-level of 1% rather than 5% to judge the significance of the findings.


Variable selection

Figure 1 presents results of UK Biobank variable mapping to the Foresight map, using a modified PRISMA flow chart where 23 variables are identified for inclusion in the classification and a further 20 variables used to profile the classification (see Appendix S1 for how these variables are constructed). The selection of 23 classification variables derived from UK Biobank variables are shown in Table 1.

After removing those participants that are pregnant, whose weight status is impacted by the pregnancy (371), those aged younger than 40 or older than 69, which was outside the recruitment criteria for the cohort (2431), and those that have missing data (154613) there are 345,091 participants available for classification. The Greenspace variable is only available for participants located in England [30, 31], meaning that participants living in Scotland and Wales are not part of this classification. The count of the food establishments within a straight line 1000 m distance was a bespoke variable calculated using the 1 km rounded up/down co-ordinates of the participants home location at the time of visiting the assessment centre and a database of Points of Interest [32]. Since we and others have found that the number of healthy and unhealthy food establishments in a neighbourhood are positively correlated, we did not differentiate by this characteristic [33, 34].

The distribution of key demographics for both the larger sample (excluding just those who are pregnant, younger and older) and the sub-sample used for classification is provided in Table 2 and shows that the sub-sample compares well with the full sample for most measures, with the exception of Townsend deprivation, where there is an indication that the sub-sample is less deprived and also there is no representation for Scotland or Wales. For this classification sample, none of the variables had a pairwise correlation greater than 0.7, as can be seen in the plot of the correlation matrix in Fig. 2, and as a result all the variables listed in Table 1 are used.

Table 2 Comparison of the characteristics of the full sample and the sub-sample used for classification
Fig. 2
figure 2

Heatmap of the correlation between classification variables

The profiling variables are shown in Table 3 and cover aspects such as the socio-demographic composition (e.g. gender and ethnicity), health status and illnesses (including specific morbidities), socio-economic composition (e.g. employment and occupation), the nature of their work (tasks involved and satisfaction) and geography.

Table 3 Variables used for profiling of the classification

Cluster analysis

The left hand scree plot in Fig. 3 shows the within class sum of squares for various values of k, whilst the left hand plot show the reduction in this statistic as k increases. The left hand plot can be difficult to interpret, so attention will focus on the right had plot showing the reduction gained as k increases. These reductions are in effect the gradient in the scree plot and we are looking for value of k for which this changes. There are large reductions up to 5 classes, and from 5 through to 8 there are more modest reductions, with the reductions after 8 being smaller – to the extent that they could be consider linear. Here we have selected a generous 8 class solution that provides the largest scope for identifying a diverse range of classes, but are mindful that checks are required to ensure that there is sufficient differentiation in these 8 classes and that each class is not too small or large.

Fig. 3
figure 3

Scree plots of the within class sum of squares and its first difference for a range of classes

To this end, the classification centres and the number of participants in each class are shown in Table 4 with radial plots for the classifications available in supplementary Fig. S1. Using the characteristics of each classification group as described in Table 4 alongside the distribution of counts in various categories of profiling variables (which are not used in the classification) as shown in Supplementary Tables S1-S18 (with indicators of particularly high and low percentages relative to the percentage in the whole classification sample) a pen-portrait is constructed for each classification and presented in Table 5. These pen portraits indicate sufficient differentiation between the classes and even the smallest cluster is still large, representing nearly 8% of participants.

Table 4 Class centres on classification variables
Table 5 Pen portraits for classification groups

Table 6 identifies a distinct profile for our classes by BMI category. Recall that BMI was not used during the classification exercise, yet the use of proxies for the drivers of obesity identified in the Foresight obesity system map have revealed differing weight status outcomes. The classes with the greatest proportion of people with a healthy weight are the ‘Younger, urban hard-pressed’ and the ‘Comfortable, fit families’ classes. Conversely the class with one of the lowest proportions in healthy weight are the ‘Deprived with less healthy lifestyles’. The class of ‘Active manual workers’ have high levels in the overweight category, and ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’ have similar high proportions in the having obesity category.

Table 6 Distribution of classification by BMI category, median and mean

A tabulation of the distribution of this classification by the region of the assessment centre attended in Table 7 also shows some interesting spatial patterns. London centres dominate the ‘Younger, urban hard-pressed’ class, with nearly 50% of those participants in this class attending a London centre, compared to less than 20% nationally. ‘Healthy, active and retirees’ also show a concentration in London. The more satisfied classes of ‘Comfortable, fit families’, ‘Content, rural and retirees’ and ‘Comfortable professionals’ are also spatially concentrated, having attended assessment centres in southern England. The midlands and northern assessment centres have a high concentration of the ‘Stressed and not in work’ participants, whilst the ‘Deprived with less healthy lifestyles’ are concentrated in the North East and North West of England. The ‘Active manual workers’ class has a fairly even split across the assessment centres, excepting London.

Table 7 Distribution of classification by location of assessment centre

The estimates from the multinomial regression models shown in Fig. 4 identify that relative risk ratio for being overweight or having obesity compared to a healthy weight and the healthy UK Biobank class of ‘Younger, urban, hard-pressed’. For the overweight status these relative risk ratios are all significantly greater than 1.0, indicating that there is significant differentiation between our classes. This relative risk ratio is higher for classes: ‘Active manual workers’, ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’. For those with a weight status of obese, the ‘Comfortable, fit families’, ‘Healthy active retirees’ and ‘Content rural retirees’ classes have a relative risk ratio that is not significantly different to that of healthy weight participants in the ‘Younger, urban, hard pressed’ class. A second model is estimated using some of the aforementioned profiling variables that are not used in the classification: gender, ethnicity, self-rated overall health, qualifications, and current employment status. In general, further adjustment with these additional profiling variables attenuates the relative risk ratios, especially for those whose weight status is generally not healthy.

Fig. 4
figure 4

Multinomial logistic regression models to investigate the association between the whole system classes and weight status. The relative risk ratios indicate risk of being overweight or having obesity compared to a healthy weight (with underweight omitted) and compared to the reference class ‘Younger, urban, hard-pressed’. Adjusted models are adjusted for gender, ethnicity, health, qualifications and employment status


This work is an important development that recognises the need for a whole systems approach to obesity research and policy making and incorporates this into the development of a classification to identify groups of individuals at highest risk of obesity - according to elements identified by the Foresight obesity systems map [5]. Cohort studies like the UK Biobank study capture a rich array of information on a large number of people, encompassing many of the seven Foresight themes: Societal influences, individual psychology, individual physical activity, physical activity environment, physiology, food production and food consumption.

Using the UK Biobank cohort study allowed us to evaluate our Foresight informed clusters to predict obesity using quality measures of our outcome of interest, overweight and obesity, where height and weight were measured, rather than self-reported.

To achieve our aims we have taken an existing large cohort of health and lifestyle data and attempted to map its data onto the Foresight obesity map. Whilst this process has identified some gaps in these data, this is perhaps to be expected given that the data is not bespoke for the obesity map and also the extensive nature of the obesity map, incorporating nearly 110 variables. However, the approach to variable selection as illustrated is, to the best of our knowledge, novel and has the potential to be replicated across other cohorts and data sources worldwide where quality overweight and obesity data may not be available, or where obesity related outcomes are of interest. A recent example of this can be seen in our study investigating whether an obesity classification can be used to identify individuals at risk of severe COVID-19 symptoms [35].

The classes we identified in this study highlight eight groups of people, similar in respect to their exposure to known drivers of obesity. The ‘Comfortable professionals’ class was most typical of the UK Biobank participant characteristics, and this extended to their BMI being average of the cohort. The classification also highlights three classes with increased relative risk of being overweight or having obesity: ‘Active manual workers’, ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’.

On completion of the research the classification will be deposited back into the UK Biobank and made available to other researchers.

Social and environmental interventions present opportunities for change that can be implemented at both a micro and macro level, in communities and whole countries by local and national governments. However, it is not yet clear what the most effective approaches would be and how to synchronise changes across the system. In order to inform such change, we first need to better understand the collective characteristics and behaviours of people who are overweight or have obesity and how these differ from those who maintain a healthy weight. Traditionally, insights of this kind are generated from cohort studies that collect a wealth of data spanning individual behaviours, demographic characteristics, anthropometric measures and health related metrics. These encompass multiple areas of the obesity system, however they mostly relate to the individual, rather than wider environmental, societal and food system determinants. The classification we present here moves beyond using solely individual behaviours and incorporates systems drivers of behaviours enabling macro level insights into obesity risk.

A more holistic approach in obesity research has existed for some time, for example, clusters of lifestyle behaviours have been used in different settings to assist policy makers in targeting populations for a range of interventions. Often such clusters focus on individual elements of lifestyle, for example, dietary patterns [36, 37]. In some cases broader lifestyle clusters are identified [38, 39]. While insightful, data driven clusters of these kinds are not easily comparable against other cohorts and populations. Variables that drive the clusters are selected through a range of variable selection methods such as principal component or factor analysis, or more simply, by information that is available within the cohort. More generic clustering solutions such as geodemographic classifications, originally generated with marketing in mind, have been utilised by local government organisations and have demonstrated utility in highlighting groups with higher prevalence of obesity and related comorbidities [40,41,42,43]. Some geographic solutions have been tailored to specific application domain areas, for example CACI’s ACORN Wellbeing classification [44] which segments the population into four categories: Health Challenges; At Risk; Caution; Healthy - and further segregates these into 25 groups. Input data for such geodemographic classifications are largely derived from census data, open data or aggregated commercial sources. In research cohort studies rich data are available at an individual level, without the need to combine aggregated data sources using computational models to estimate patterns. Therefore, combining insight from geodemographic classifications with rich cohort data using a robust framework such as the Foresight system map has demonstrated here exciting possibilities for better understanding and leveraging the insight relevant to the whole obesity system.

The UK Biobank cohort is an example of an important study that is already contributing to obesity research, albeit focusing on specific areas of the obesity system, rather than the whole system. Using UK Biobank, [45] and [46] examine the influence of the availability of fast food on obesity. Activity levels are studied as commuting behaviour in two studies reported by Flint and Cummins [47] and Flint, Webb [48] and by an examination of how the built environment can influence activity patterns [9, 48]. Other potential influences for obesity have also been studied using UK Biobank participant data, including socio-economic factors [49], smoking [50], presence of morbidities [51], work patterns [52] and ethnicity [53]. In this research study, we incorporate all relevant drivers of obesity as identified by the theoretical framework presented in the Foresight obesity system map, therefore extending what has been done before using either individual or environmental influences upon obesity.

Green, Strong [54] conducted a classification exercise, based solely on data from participants classified as having obesity, meaning that the purpose of the classification was not to differentiate different weight statuses. Furthermore, this classification did not use the Foresight obesity system map to inform the classification development.


We found that Foresight themes relating to individual behaviours were better captured within the cohort data, with the environmental, societal and food production areas more difficult to populate. Given that the UK Biobank cohort was first established prior to the 2007 Foresight report, it is not surprising that they didn’t consider collecting information from participants about their exposures to aspects of the wider obesity system, especially so since the aims of the UK Biobank cohort study are broad in respect to helping to gain a better understanding of determinants of disease, not just obesity. The Foresight obesity systems map incorporates some upstream determinants of behaviours, for example, elements of food production in addition to food consumption [15]. With regards to food consumption, the quantity of this information was challenging to incorporate into the classification. This is not so much of an issue since we were not trying to generate dietary pattern clusters, instead we have generated whole systems obesity clusters so therefore to collapse the dietary information into a small group of food consumption variables is acceptable, a process explained in supplementary appendix S1. While some dietary information is lost using this approach, it achieved our goal of condensing the dietary information such that it did not dominate the classification. It is also important to acknowledge that dietary data collected in the UK Biobank was self-reported and therefore subject to bias. However, a recent validation study of the Oxford WebQ dietary questionnaire suggests reasonable agreement against biomarkers when compared with dietary recall interviews [55]. Other self-reported measures, such as TV watching were also used in our classification and we are not aware that these have undergone such scrutiny.

It was originally envisaged that more variables would be available to map onto the Physical Activity Environment Foresight theme from the UK Biobank Urban Morphometric Platform bulk data [56]. However, on receipt of these data we found that there were issues around the completeness and comprehension, for example: incomplete greenness data and inconsistency between outlet density and count metrics in the same field. Recognising that there was still a need to incorporate some linked environmental data that considers the context in which individual behaviours are conditioned, we used the alternative greenspace indicator of Wheeler [31] and calculated our own food environment exposure, recognising that positive correlations exist between counts of healthy and unhealthy food establishments [33, 34].

In developing the new classification we found that both age and gender were important with respect to the outcome BMI and also as a confounding factor with some of the classification variables – e.g. hand grip strength. Supplementary material in appendix S1 explains in detail how gender was used to standardise these variables. While these characteristics do not appear in the Foresight maps as features of the obesity system, they are still important to consider when developing methods to better understand obesity. This is suggestive that perhaps the Foresight obesity system map should in fact encompass more nodes within its complex system map.

This study employed a comprehensive variable selection process for the development of the obesity classification, driven by the Foresight obesity system map. This process was rigorous and completed by three independent researchers from a range of backgrounds and agreement on variable selection reached by informed discussions. However, there are limitations to this approach, especially with respect to the omission of categorical variables, which cannot easily be used in the k-means clustering algorithm. Other approaches do exist which can deal with categorical variables [57, 58], however all algorithms require decisions and compromises to be make, and k-means has proved efficient and effective in producing distinct classes that exhibit significant differences in both weight status and the profiling variables identified in Table 3.

Here we only considered baseline BMI, as the follow up measurements in the cohort were only for subsamples. We did investigate agreement in measurement between the multiple time points and found that the baseline information was a strong indicative measure of subsequent BMI for us to test our classification against. In the same vein, we did not make use of other candidate variables which were only available for a subset of UK Biobank participants, for example the accelerometer data.

Some caution must be taken when interpreting the results in this paper given that the UK Biobank participants are, by design, from an older demographic, which is largely White British in ethnicity, and is also not generalisable to the wider UK population [59]. These features mean that whilst obesity is seen to persist from childhood into adolescence, adulthood, middle age and into the senior years, it is only the later phase of this life course that is picked up here [60, 61]. That said, any biases in these data do not impact upon the methods and process we present for selecting variables and development of such a classification since k-means does not require that a sample be representative of a population in order to make certain inferences.

Policy recommendations and future applications

We recommend using the Foresight obesity systems map as a framework to inform variable selection for obesity research and to drive policy making. We have demonstrated its ability to produce a clustering solution within the UK Biobank. Therefore, while classes derived from the clustering remain relevant to the population the data relate too, the methods for variable selection are consistent, meaning that methods can be reproduced for different data and compared. However, in order for this to be most effectively applied we would recommend a broader representation of the data from each of the Foresight themes, which may be achieved through a more targeted data collection in such cohort studies, or through collating other sources of information on the system, such as those provided by consumer data. A combination of existing data sources would present a powerful alternative to new primary data collection.

A further development of our classification, which would enrich understanding of areas and groups of people most in need of positive change, would be to incorporate a geographic identifier to the whole system classification, akin to geodemographic classifications such as the Output Area Classification from the Office for National Statistics [62] or commercial classifications like Cameo [63]. With geographic identifiers incorporated the potential use for this type of classification would extend to better targeting of resource and support, for example they could be used by national policy makers to allocate funds to areas most in need of making system wide changes and in turn by public health directorates in local authorities to allocate resources to neighbourhoods most requiring support.

Overweight and obesity have been substantial public health challenges for some time, but in light of the recent ‘call to action’ within the UK’s National Obesity Strategy, citing COVID-19 as a wakeup call, methods to better target resources to improve health through reducing prevalence overweight and obesity, are more important than ever [64]. Indeed, we have found that our classification reveals significant differences in exposure, treatment and mortality for COVID-19 by assessing outcomes in the linked test, hospitalisation and deaths data available within UK Biobank [35] .


This work presents an innovative new approach to better understanding the whole systems drivers of obesity which has the potential to produce meaningful tools for policy makers to better target interventions across the whole system to reduce overweight and obesity.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due access arrangements required by UK Biobank. Please contact for details. Specifically, these data can be obtained by registering an interest ( and submitting an application for data access (, quoting Project ID 30846. The software for the analysis is available from the corresponding author on reasonable request.

Change history



Body Mass Index


Coronavirus disease


Organisation for European Economic Co-operation


Preferred Reporting Items for Systematic Reviews and Meta-Analyses


United Kingdom of Great Britain and Northern Ireland


UK biobank


  1. World Health Organisation. Obesity and Overweight 2020 Available from:

    Google Scholar 

  2. World Obesity. Prevelence of Obesity 2020 Available from:

    Google Scholar 

  3. Medical Research Council. Nutrition and Obesity 2020 Available from:

    Google Scholar 

  4. Medical Research Council. Obesity research priorities 2020 Available from:

    Google Scholar 

  5. Butland B, Jebb S, Kopelman P, McPherson K, Thomas S, Mardell J, et al. Tackling obesities: future choices-project report: Department of Innovation, Universities and Skills London; 2007 Available from:

    Google Scholar 

  6. Lee BY, Bartsch SM, Mui Y, Haidari LA, Spiker ML, Gittelsohn J. A systems approach to obesity. Nutr Rev. 2017;75(suppl 1):94–106.

    PubMed  PubMed Central  Article  Google Scholar 

  7. Ulijaszek S. With the benefit of foresight: Obesity, complexity and joined-up government. BioSocieties. 2015;10(2):213–28.

    Article  Google Scholar 

  8. Burgoine T, Forouhi NG, Griffin SJ, Brage S, Wareham NJ, Monsivais P. Does neighborhood fast-food outlet exposure amplify inequalities in diet and obesity? A cross-sectional study. Am J Clin Nutr. 2016;103(6):1540–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Sarkar C. Residential greenness and adiposity: findings from the UK biobank. Environ Int. 2017;106:1–10.

    PubMed  Article  Google Scholar 

  10. Swinburn B, Egger G, Raza F. Dissecting obesogenic environments: the development and application of a framework for identifying and prioritizing environmental interventions for obesity. Prev Med. 1999;29(6 Pt 1):563–70.

    CAS  PubMed  Article  Google Scholar 

  11. Devaux M, Goryakin Y, Cecchini M, Huber H, Colombo F. OECD Obesity Update 2017. Paris: OECD; 2017. Available from:

    Google Scholar 

  12. Miles I, Keenan M. Ten Years of Foresight in the UK. Tokio: documento presentado en la segunda conferencia internacional sobre prospectiva tecnológica; 2003.

    Google Scholar 

  13. Government Office for Science. Reducing obesity: future choices 2007 Available from:

    Google Scholar 

  14. Local Government Association. Making obesity everybody’s business: a whole systems approach to obesity: Local Government Association; 2017.

    Google Scholar 

  15. Morris MA, Wilkins E, Timmins KA, Bryant M, Birkin M, Griffiths C. Can big data solve a big problem? Reporting the obesity data landscape in line with the foresight obesity system map. Int J Obes. 2018;42(12):1963–76.

    Article  Google Scholar 

  16. Bagnall AM, Radley D, Jones R, Gately P, Nobles J, Van Dijk M, et al. Whole systems approaches to obesity and other complex public health challenges: a systematic review. BMC Public Health. 2019;19(1):8.

    PubMed  PubMed Central  Article  Google Scholar 

  17. Allender S, Millar L, Hovmand P, Bell C, Moodie M, Carter R, et al. Whole of systems trial of prevention strategies for childhood Obesity: who stops childhood obesity. Int J Environ Res Public Health. 2016;13(11):1143.

    PubMed Central  Article  Google Scholar 

  18. Johnston LM, Matteson CL, Finegood DT. Systems science and obesity policy: a novel framework for analyzing and rethinking population-level planning. Am J Public Health. 2014;104(7):1270–8.

    PubMed  PubMed Central  Article  Google Scholar 

  19. Lake AA, Henderson EJ, Townshend TG. Exploring planners’ and public health practitioners’ views on addressing obesity: lessons from local government in England. Cities Health. 2017;1(2):185–93.

    Article  Google Scholar 

  20. Allen N, Sudlow C, Downey P, Peakman T, Danesh J, Elliott P, et al. UK biobank: current status and what it means for epidemiology. Health Policy Techn. 2012;1(3):123–6.

    Article  Google Scholar 

  21. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779.

    PubMed  PubMed Central  Article  Google Scholar 

  22. Hanscombe KB, Coleman JRI, Traylor M, Lewis CM. ukbtools: an R package to manage and query UK Biobank data. Plos One. 2019;14(5):e0214311.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. Jebb S, Kopelman P, Butland B. Executive summary: foresight ‘tackling obesities: future choices’ project. Obes Rev. 2007;8:vi–ix.

    PubMed  Article  Google Scholar 

  24. Mason KE, Palla L, Pearce N, Phelan J, Cummins S. Genetic risk of obesity as a modifier of associations between neighbourhood environment and body mass index: an observational study of 335 046 UK Biobank participants. BMJ Nutr Prev Health. 2020.

  25. Everitt B, Landau S, Leese M. Cluster analysis. 4th ed. London: Arnold; 2001.

    Google Scholar 

  26. Oberski D. Mixture models: Latent profile and latent class analysis. In: Modern statistical methods for HCI: Springer; 2016. p. 275–87.

    Chapter  Google Scholar 

  27. Mosteller F, Tukey JW. Data analysis and regression: a second course in statistics; 1977.

    Google Scholar 

  28. Ford N, Trott P, Simms C. Food portions and consumer vulnerability: qualitative insights from older consumers. Qual Market Res. 2019.

  29. Jensen GL, Hsiao PY. Obesity in older adults: relationship to functional limitation. Curr Opin Clin Nutr Metab Care. 2010;13(1):46–51.

    CAS  PubMed  Article  Google Scholar 

  30. Department for Communities and Local Government. Generalised Land Use Database Statistics for England 2005 (Enhanced Basemap). 2007.

    Google Scholar 

  31. Wheeler B. Documentation of environmental indicators attributed to participants based on home location grid references 2017 Available from:

    Google Scholar 

  32. Ordnance Survey. Points of interest [TAB geospatial data]. England: EDINA Digimap Ordnance Survey Service; 2014.

    Google Scholar 

  33. Hawkesworth S, Silverwood R, Armstrong B, Pliakas T, Nanchahal K, Sartini C, et al. Investigating the importance of the local food environment for fruit and vegetable intake in older men and women in 20 UK towns: a cross-sectional analysis of two national cohorts using novel methods. Int J Behav Nutr Phys Act. 2017;14(1):1–14.

    Article  Google Scholar 

  34. Hobbs M, Wilkins E, Lamb K, McKenna J, Griffiths C. Associations between food environment typologies and body mass index: evidence from Yorkshire. Engl Soc Sci Med. 2019;239:112528.

    CAS  Article  Google Scholar 

  35. Clark S, Morris M, Lomax N, Birkin M. Can a data driven obesity classification system identify those at risk of severe COVID-19 in the UK biobank cohort study? Int J Obes. 2021:1–5.

  36. Greenwood D, Cade J, Draper A, Barrett J, Calvert C, Greenhalgh A. Seven unique food consumption patterns identified among women in the UK Women’s cohort study. Eur J Clin Nutr. 2000;54(4):314–20.

    CAS  PubMed  Article  Google Scholar 

  37. Kant AK. Dietary patterns and health outcomes. J Am Diet Assoc. 2004;104(4):615–35.

    PubMed  Article  Google Scholar 

  38. Cameron AJ, Crawford DA, Salmon J, Campbell K, McNaughton SA, Mishra GD, et al. Clustering of obesity-related risk behaviors in children and their mothers. Ann Epidemiol. 2011;21(2):95–102.

    PubMed  Article  Google Scholar 

  39. Patterson RE, Haines PS, Popkin BM. Health lifestyle patterns of US adults. Prev Med. 1994;23(4):453–60.

    CAS  PubMed  Article  Google Scholar 

  40. Nottingham City Council. Healthy Weight Strategy for Nottingham City 2011-2020 2011 Available from:

    Google Scholar 

  41. Petersen J, Gibin M, Longley P, Mateos P, Atkinson P, Ashby D. Geodemographics as a tool for targeting neighbourhoods in public health campaigns. J Geogr Syst. 2011;13(2):173–92.

    Article  Google Scholar 

  42. South East Public Health Observatory. Cardiovascular disease PCT health profile. Wakefield District; 2011. Available from:

  43. Sunderland City Council. Health Weight and Excess Weight 2017 Available from:

    Google Scholar 

  44. CACI. The Wellbeing acorn user guide 2021 Available from:

    Google Scholar 

  45. Burgoine T, Sarkar C, Webster CJ, Monsivais P. Examining the interaction of fast-food outlet exposure and income on diet and obesity: evidence from 51,361 UK biobank participants. Int J Behav Nutr Phys Act. 2018;15(1):71.

    PubMed  PubMed Central  Article  Google Scholar 

  46. Mason KE, Pearce N, Cummins S. Associations between fast food and physical activity environments and adiposity in mid-life: cross-sectional, observational evidence from UK biobank. Lancet Public Health. 2018;3(1):e24–33.

    PubMed  Article  Google Scholar 

  47. Flint E, Cummins S. Active commuting and obesity in mid-life: cross-sectional, observational evidence from UK biobank. Lancet Diabetes Endocrinol. 2016;4(5):420–35.

    PubMed  Article  Google Scholar 

  48. Flint E, Webb E, Cummins S. Change in commute mode and body-mass index: prospective, longitudinal evidence from UK biobank. Lancet Public Health. 2016;1(2):e46–55.

    PubMed  PubMed Central  Article  Google Scholar 

  49. Tyrrell J, Jones SE, Beaumont R, Astley CM, Lovell R, Yaghootkar H, et al. Height, body mass index, and socioeconomic status: mendelian randomisation study in UK biobank. BMJ. 2016;352:i582.

    PubMed  PubMed Central  Article  Google Scholar 

  50. Dare S, Mackay DF, Pell JP. Relationship between smoking and obesity: a cross-sectional study of 499,504 middle-aged adults in the UK general population. Plos One. 2015;10(4):e0123579.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  51. Lyall DM, Celis-Morales C, Ward J, Iliodromiti S, Anderson JJ, Gill JMR, et al. Association of body mass index with cardiometabolic disease in the UK biobank: a mendelian randomization study. JAMA Cardiol. 2017;2(8):882–9.

    PubMed  PubMed Central  Article  Google Scholar 

  52. Wyse CA, Celis Morales CA, Graham N, Fan Y, Ward J, Curtis AM, et al. Adverse metabolic and mental health outcomes associated with shiftwork in a population-based study of 277,168 workers in UK biobank<sup/>. Ann Med. 2017;49(5):411–20.

    PubMed  Article  Google Scholar 

  53. Ntuk UE, Gill JM, Mackay DF, Sattar N, Pell JP. Ethnic-specific obesity cutoffs for diabetes risk: cross-sectional study of 490,288 UK biobank participants. Diabetes Care. 2014;37(9):2500–7.

    PubMed  Article  Google Scholar 

  54. Green MA, Strong M, Razak F, Subramanian SV, Relton C, Bissell P. Who are the obese? A cluster analysis exploring subgroups of the obese. J Public Health (Oxf). 2016;38(2):258–64.

    CAS  Article  Google Scholar 

  55. Greenwood DC, Hardie LJ, Frost GS, Alwan NA, Bradbury KE, Carter M, et al. Validation of the Oxford WebQ online 24-hour dietary questionnaire using biomarkers. Am J Epidemiol. 2019;188(10):1858–67.

    PubMed  PubMed Central  Article  Google Scholar 

  56. Sarkar C, Webster C, Gallacher J. UK biobank urban morphometric platform (UKBUMP) – a nationwide resource for evidence-based healthy city planning and public health interventions. Ann GIS. 2015;21(2):135–48.

    Article  Google Scholar 

  57. Ahmad A, Khan SS. Survey of state-of-the-art mixed data clustering algorithms. Ieee Access. 2019;7:31883–902.

    Article  Google Scholar 

  58. Preud'homme G, Duarte K, Dalleau K, Lacomblez C, Bresso E, Smail-Tabbone M, et al. Head-to-head comparison of clustering methods for heterogeneous data: a simulation-driven benchmark. Sci Rep. 2021;11(1):4202.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Fry A, Littlejohns TJ, Sudlow C, Doherty N, Adamska L, Sprosen T, et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–34.

    PubMed  PubMed Central  Article  Google Scholar 

  60. Mizuno T, Shu IW, Makimura H, Mobbs C. Obesity over the life course. Sci Aging Knowledge Environ. 2004;2004(24):re4.

    PubMed  Article  Google Scholar 

  61. Lee JM, Pilli S, Gebremariam A, Keirns CC, Davis MM, Vijan S, et al. Getting heavier, younger: trajectories of obesity over the life course. Int J Obes. 2010;34(4):614–23.

    CAS  Article  Google Scholar 

  62. Gale CG, Singleton AD, Bates AG, Longley PA. Creating the 2011 area classification for output areas (2011 OAC). J Spat Int Sci. 2016;(12):1–27.

  63. TransUnionUK. CAMEO UK 2017 [Available from:

  64. Department of Health and Social Care. Tackling obesity: government strategy 2020 Available from:

    Google Scholar 

Download references


This research has been conducted using the UK Biobank resource. We would like to thank the UK Biobank participants.


This work was supported by the Economic and Social Research Council funded Consumer Data Research Centre (CDRC) - Grant reference - ES/S007164/1.

Author information

Authors and Affiliations



Conception: NL, MM, MB; design of the work: SC, NL, MM, MB; the acquisition, analysis: SC, NL, MM; interpretation of data: SC, NL, MM; the creation of new software used in the work: SC; have drafted the work: SC, NL, MM; substantive revisions: SC, NL, MM, MB. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Stephen Clark.

Ethics declarations

Ethics approval and consent to participate

The UK Biobank study attained research ethics committee ethical approval ref.: 11/NW/0382. This study has been granted ethical approval by the University of Leeds ethics committee ref.: LTGEOG-034. 1. All participants are aged 16 or older and informed consent was obtained from all participants. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not Applicable.

Competing interests

None of the authors have any financial or non-financial competing interests to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been updated to correct supplementary file 3.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Clark, S., Lomax, N., Birkin, M. et al. A foresight whole systems obesity classification for the English UK biobank cohort. BMC Public Health 22, 349 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Overweight
  • Obesity
  • Whole systems
  • Classification
  • UK biobank
  • K-means
  • Variable selection