This work is an important development that recognises the need for a whole systems approach to obesity research and policy making and incorporates this into the development of a classification to identify groups of individuals at highest risk of obesity - according to elements identified by the Foresight obesity systems map . Cohort studies like the UK Biobank study capture a rich array of information on a large number of people, encompassing many of the seven Foresight themes: Societal influences, individual psychology, individual physical activity, physical activity environment, physiology, food production and food consumption.
Using the UK Biobank cohort study allowed us to evaluate our Foresight informed clusters to predict obesity using quality measures of our outcome of interest, overweight and obesity, where height and weight were measured, rather than self-reported.
To achieve our aims we have taken an existing large cohort of health and lifestyle data and attempted to map its data onto the Foresight obesity map. Whilst this process has identified some gaps in these data, this is perhaps to be expected given that the data is not bespoke for the obesity map and also the extensive nature of the obesity map, incorporating nearly 110 variables. However, the approach to variable selection as illustrated is, to the best of our knowledge, novel and has the potential to be replicated across other cohorts and data sources worldwide where quality overweight and obesity data may not be available, or where obesity related outcomes are of interest. A recent example of this can be seen in our study investigating whether an obesity classification can be used to identify individuals at risk of severe COVID-19 symptoms .
The classes we identified in this study highlight eight groups of people, similar in respect to their exposure to known drivers of obesity. The ‘Comfortable professionals’ class was most typical of the UK Biobank participant characteristics, and this extended to their BMI being average of the cohort. The classification also highlights three classes with increased relative risk of being overweight or having obesity: ‘Active manual workers’, ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’.
On completion of the research the classification will be deposited back into the UK Biobank and made available to other researchers.
Social and environmental interventions present opportunities for change that can be implemented at both a micro and macro level, in communities and whole countries by local and national governments. However, it is not yet clear what the most effective approaches would be and how to synchronise changes across the system. In order to inform such change, we first need to better understand the collective characteristics and behaviours of people who are overweight or have obesity and how these differ from those who maintain a healthy weight. Traditionally, insights of this kind are generated from cohort studies that collect a wealth of data spanning individual behaviours, demographic characteristics, anthropometric measures and health related metrics. These encompass multiple areas of the obesity system, however they mostly relate to the individual, rather than wider environmental, societal and food system determinants. The classification we present here moves beyond using solely individual behaviours and incorporates systems drivers of behaviours enabling macro level insights into obesity risk.
A more holistic approach in obesity research has existed for some time, for example, clusters of lifestyle behaviours have been used in different settings to assist policy makers in targeting populations for a range of interventions. Often such clusters focus on individual elements of lifestyle, for example, dietary patterns [36, 37]. In some cases broader lifestyle clusters are identified [38, 39]. While insightful, data driven clusters of these kinds are not easily comparable against other cohorts and populations. Variables that drive the clusters are selected through a range of variable selection methods such as principal component or factor analysis, or more simply, by information that is available within the cohort. More generic clustering solutions such as geodemographic classifications, originally generated with marketing in mind, have been utilised by local government organisations and have demonstrated utility in highlighting groups with higher prevalence of obesity and related comorbidities [40,41,42,43]. Some geographic solutions have been tailored to specific application domain areas, for example CACI’s ACORN Wellbeing classification  which segments the population into four categories: Health Challenges; At Risk; Caution; Healthy - and further segregates these into 25 groups. Input data for such geodemographic classifications are largely derived from census data, open data or aggregated commercial sources. In research cohort studies rich data are available at an individual level, without the need to combine aggregated data sources using computational models to estimate patterns. Therefore, combining insight from geodemographic classifications with rich cohort data using a robust framework such as the Foresight system map has demonstrated here exciting possibilities for better understanding and leveraging the insight relevant to the whole obesity system.
The UK Biobank cohort is an example of an important study that is already contributing to obesity research, albeit focusing on specific areas of the obesity system, rather than the whole system. Using UK Biobank,  and  examine the influence of the availability of fast food on obesity. Activity levels are studied as commuting behaviour in two studies reported by Flint and Cummins  and Flint, Webb  and by an examination of how the built environment can influence activity patterns [9, 48]. Other potential influences for obesity have also been studied using UK Biobank participant data, including socio-economic factors , smoking , presence of morbidities , work patterns  and ethnicity . In this research study, we incorporate all relevant drivers of obesity as identified by the theoretical framework presented in the Foresight obesity system map, therefore extending what has been done before using either individual or environmental influences upon obesity.
Green, Strong  conducted a classification exercise, based solely on data from participants classified as having obesity, meaning that the purpose of the classification was not to differentiate different weight statuses. Furthermore, this classification did not use the Foresight obesity system map to inform the classification development.
We found that Foresight themes relating to individual behaviours were better captured within the cohort data, with the environmental, societal and food production areas more difficult to populate. Given that the UK Biobank cohort was first established prior to the 2007 Foresight report, it is not surprising that they didn’t consider collecting information from participants about their exposures to aspects of the wider obesity system, especially so since the aims of the UK Biobank cohort study are broad in respect to helping to gain a better understanding of determinants of disease, not just obesity. The Foresight obesity systems map incorporates some upstream determinants of behaviours, for example, elements of food production in addition to food consumption . With regards to food consumption, the quantity of this information was challenging to incorporate into the classification. This is not so much of an issue since we were not trying to generate dietary pattern clusters, instead we have generated whole systems obesity clusters so therefore to collapse the dietary information into a small group of food consumption variables is acceptable, a process explained in supplementary appendix S1. While some dietary information is lost using this approach, it achieved our goal of condensing the dietary information such that it did not dominate the classification. It is also important to acknowledge that dietary data collected in the UK Biobank was self-reported and therefore subject to bias. However, a recent validation study of the Oxford WebQ dietary questionnaire suggests reasonable agreement against biomarkers when compared with dietary recall interviews . Other self-reported measures, such as TV watching were also used in our classification and we are not aware that these have undergone such scrutiny.
It was originally envisaged that more variables would be available to map onto the Physical Activity Environment Foresight theme from the UK Biobank Urban Morphometric Platform bulk data . However, on receipt of these data we found that there were issues around the completeness and comprehension, for example: incomplete greenness data and inconsistency between outlet density and count metrics in the same field. Recognising that there was still a need to incorporate some linked environmental data that considers the context in which individual behaviours are conditioned, we used the alternative greenspace indicator of Wheeler  and calculated our own food environment exposure, recognising that positive correlations exist between counts of healthy and unhealthy food establishments [33, 34].
In developing the new classification we found that both age and gender were important with respect to the outcome BMI and also as a confounding factor with some of the classification variables – e.g. hand grip strength. Supplementary material in appendix S1 explains in detail how gender was used to standardise these variables. While these characteristics do not appear in the Foresight maps as features of the obesity system, they are still important to consider when developing methods to better understand obesity. This is suggestive that perhaps the Foresight obesity system map should in fact encompass more nodes within its complex system map.
This study employed a comprehensive variable selection process for the development of the obesity classification, driven by the Foresight obesity system map. This process was rigorous and completed by three independent researchers from a range of backgrounds and agreement on variable selection reached by informed discussions. However, there are limitations to this approach, especially with respect to the omission of categorical variables, which cannot easily be used in the k-means clustering algorithm. Other approaches do exist which can deal with categorical variables [57, 58], however all algorithms require decisions and compromises to be make, and k-means has proved efficient and effective in producing distinct classes that exhibit significant differences in both weight status and the profiling variables identified in Table 3.
Here we only considered baseline BMI, as the follow up measurements in the cohort were only for subsamples. We did investigate agreement in measurement between the multiple time points and found that the baseline information was a strong indicative measure of subsequent BMI for us to test our classification against. In the same vein, we did not make use of other candidate variables which were only available for a subset of UK Biobank participants, for example the accelerometer data.
Some caution must be taken when interpreting the results in this paper given that the UK Biobank participants are, by design, from an older demographic, which is largely White British in ethnicity, and is also not generalisable to the wider UK population . These features mean that whilst obesity is seen to persist from childhood into adolescence, adulthood, middle age and into the senior years, it is only the later phase of this life course that is picked up here [60, 61]. That said, any biases in these data do not impact upon the methods and process we present for selecting variables and development of such a classification since k-means does not require that a sample be representative of a population in order to make certain inferences.
Policy recommendations and future applications
We recommend using the Foresight obesity systems map as a framework to inform variable selection for obesity research and to drive policy making. We have demonstrated its ability to produce a clustering solution within the UK Biobank. Therefore, while classes derived from the clustering remain relevant to the population the data relate too, the methods for variable selection are consistent, meaning that methods can be reproduced for different data and compared. However, in order for this to be most effectively applied we would recommend a broader representation of the data from each of the Foresight themes, which may be achieved through a more targeted data collection in such cohort studies, or through collating other sources of information on the system, such as those provided by consumer data. A combination of existing data sources would present a powerful alternative to new primary data collection.
A further development of our classification, which would enrich understanding of areas and groups of people most in need of positive change, would be to incorporate a geographic identifier to the whole system classification, akin to geodemographic classifications such as the Output Area Classification from the Office for National Statistics  or commercial classifications like Cameo . With geographic identifiers incorporated the potential use for this type of classification would extend to better targeting of resource and support, for example they could be used by national policy makers to allocate funds to areas most in need of making system wide changes and in turn by public health directorates in local authorities to allocate resources to neighbourhoods most requiring support.
Overweight and obesity have been substantial public health challenges for some time, but in light of the recent ‘call to action’ within the UK’s National Obesity Strategy, citing COVID-19 as a wakeup call, methods to better target resources to improve health through reducing prevalence overweight and obesity, are more important than ever . Indeed, we have found that our classification reveals significant differences in exposure, treatment and mortality for COVID-19 by assessing outcomes in the linked test, hospitalisation and deaths data available within UK Biobank  .