Measuring capabilities in health and physical activity promotion: a systematic review

Background The capability approach by Amartya Sen and Martha Nussbaum has gained increasing attention in the field of public health. As it combines individual, social and structural factors and shifts the focus of attention from the actual behavior towards available options for health behaviors that people can actually choose from, it may help advance our understanding of complex health issues. Objectives The aim of this article is to identify and describe tools available to measure capabilities within the context of health, with a specific focus on capabilities for health-enhancing physical activity. Method We conducted a systematic literature review using 11 databases covering scientific journal articles published in English or German between the years 2000 and 2020 with a focus on capabilities for health or physical activity. Results We found a total of 51 articles meeting our inclusion criteria. Four articles measured capabilities using qualitative methods, one combined qualitative and quantitative methods, while the rest used quantitative methods. We identified a total 11 different capability questionnaires, all showing moderate to good validity/reliability. Only one questionnaire and one interview-based tool specifically dealt with capabilities for health enhancing physical activity. Conclusion Although we were able to identify measurement tools for capabilities in health, this review has shown that there is no generic tool available for the measurement across all population- and age-groups, and tools focusing on physical activity are scarce. However, our results can be used as guide for future projects that aim at measuring capabilities.


Background
Over the last years, the capability approachoriginally developed by Amartya Sen [1] in welfare economicshas gained increasing attention in the field of health and has been used in multiple health promotion projects [2][3][4][5][6]. A recent review by Helter et al. [7] highlights this growing relevance of the capability approach in health promotion, particularly regarding its use within health economic evaluation of projects. The capability approach shifts the focus of attention from an individual's actual behaviorthe realization of "various things a person may value being or doing" [8], e. g. having a healthy diet (called "achieved functionings")towards the real opportunities -"various combinations of functionings that the person can achieve" [8] (called "capabilities")available to individuals to choose from.
The shift of focus from people's behavior towards their real opportunities, that they can value and realize, can be particularly beneficial in the field of health promotion. In the context of this paper, we look at the capability approach from the perspective of physical activity (PA). PA is commonly defined as "any bodily movements produced by skeletal muscles that result in energy expenditure" [9] and has been generally proven to have a positive impact on people's health, e. g. in relation to obesity, non-communicable diseases (e. g. diabetes, highblood pressure), cardio-respiratory health, cancer, mental health and all-cause mortality [10,11]. Healthenhancing PA (HEPA) may come in many forms and shapes across multiple domains, e. g. during leisure time (e. g. sports, walks or hiking), at the workplace, during transport (e. g. biking to school), or at home (e. g. gardening) [12].
Current efforts to promote PA, however, tend to focus on "downstream" interventions (e. g. physical education in school or structured PA classes for older people) that promise to have immediate effects on the target group's health [13]. However, such interventions focus mainly on outcome improvement, i. e. achieved health functionings, and tend to neglect the environmental or social components that led to the outcome in the first place. In doing so, such interventions may be less sustainable than more "upstream" interventions whose effects cannot immediately be measured in terms of target group behavior change (e. g. those that initiate infrastructure change [14] or that increase individuals' physical literacy, i. e. their "motivation, confidence, physical competence, knowledge and understanding to value and take responsibility for maintaining purposeful physical pursuits/activities throughout the lifecourse" [15]). To achieve sustainable behavior change, there is a need to extend the focus of HEPA interventions from focusing solely on outcomes (e. g. steps, hours spent being physically active, effects on weight etc.) towards also considering the capabilities of target groups to engage in desired behavior or to achieve valued states of being.
The capability approach may help achieve this shift of focus by pointing to the benefits in terms of capabilities for healthy behavior [2] or, as in our specific case, HEPA. It explicitly respects people's freedom to decide for or against a healthy behavior and looks at available or unavailable components which may have led to the specific outcome. Therefore, applying the capability approach within a health promotion project may enhance the target group's compliance by focusing on how to positively change the opportunities for health that they consider meaningful and desirable, rather than merely "forcing" them to behave in a healthy manner (e. g. through mandatory physical education in schools) to achieve positive change of health outcome.
In general, a person's capabilities for health enhancing behavior can be assumed to be based on a set of capitals or resources [5] that are "translated" into capabilities through three sets of conversion factors [6]: (1) individual (e. g. physical condition, biological health or health literacy), (2) social (e. g. norms and values, social practices or political rules), and (3) environmental factors (e. g. climate, pollution, infrastructure). However, operationalizing a concept as complex as the capability approach [3] (or, to give another example, Antonovsky's [16] "sense of coherence") for actual measurement is challenging, as it is rather theoretical in nature and underspecified (potentially by design) with respect to empirical application. Nonetheless, the increasing popularity of the capability approach in health and PA promotion obliges us to assess not only health status and indicators of behavior but also the available opportunities that people have to realize healthy behavior.
The aim of this paper is to support researchers in health and HEPA promotion who intend to use the capability approach by (1) systematically identifying all currently available tools to measure capabilities for health, well-being, and PA, (2) providing an overview of the main features of these tools as well as their psychometric properties, and applicability to different areas, and (3) discussing how the identified capability measures can be specifically used in the field of HEPA promotion by future researchers.

Methods
Research for this paper was conducted in the context of Capital4Health, a research consortium funded by the German Federal Ministry of Education and Research [01EL1421A-F] which aimed at promoting active lifestyles in four different settings across the life-course using the capability approach. A project (CAPCOM, [01EL1421A]) tasked with fostering cooperation in the consortium conducted the systematic review at hand in order to strengthen its common methodological base. The presented work followed the Preferred Reporting Items for Systematic-Reviews and Meta-Analyses (PRIS MA) guidelines [17].
An initial exploratory search for instruments to measure capabilities specifically for PA indicated that only a limited number of instruments were dedicated to this topic, we therefore decided to broaden the search to include capability measurement tools for health in general. This expansion may seem radical but was a logical next step given our health-centered perspective on PA and HEPA [12]. As options for measuring capabilities for PA are limited, gathering information on available measurement tools for the general capabilities of health and well-being will enable the identification of tools that can be adapted to PA or, in cases where adaptation is difficult, provide valuable lessons for the future development of new specific capability measurement tools for HEPA.
Supported by a university librarian, research team developed a set of search strings consisting of variations of the terms "capability approach", "measurement", "health" and "physical activity" combined with Boolean operators. A full version of the search term is provided in the appendix. On 14 th of October 2020, searches were conducted on the following databases: APA Psycinfo, Psychology and Behavioral Sciences Collection, SPORT-Discus, and APA PsycArticles via EBSCOhost, Applied Social Science Index & Abstracts, Sociological Abstracts, Social Services Abstracts, Worldwide Political Science Abstracts, International Bibliography of the Social Science, and the Sports Medicine & Education Index via ProQuest, and Pubmed. Table 1 summarizes the inclusion/exclusion criteria applied to the results. Articles were included if they (a) were published between January 2000 and October 2020; (b) were written in English or German; (c) were scientific journal articles; (d) had a clear focus on the operationalization of the capability approach within the context of health or HEPA; and referred to any (e) population, (f) setting, or (g) country.
Two researchers independently screened all titles/abstracts based on the inclusion/exclusion criteria and discussed their results to resolve disagreement. Two researchers then independently screened the full texts of all remaining papers and discussed their results to reach consensus on the articles to be included for detailed analysis. In addition, the lead author carried out a supplementary hand search. Results of the latter were doublechecked by another researcher. The included final search results were imported into Endnote X9 and analyzed regarding (i) the proposed types of measurement instruments for capabilities, (ii) the development process employed to develop these instruments, and (iii) the empirically tested validity, reliability, and responsiveness of the instruments among different target groups.
For better comparison, in the context of this paper, we rated instrument quality as follows: construct validity was categorized as "good" when correlations with any chosen other instrument had shown to be at least moderate and significant, or when its chi-square analysis had shown to be significant at the 5% level [18]. We only rated the outcomes reported in the respective paper but not the measurement tool used for the comparison. Discriminant validity was rated as "good" when the instrument showed a significant (at least p < .01) distinction between different areas. Internal consistency with α > .7 was considered "good" [19], as well as test-retest reliability with a moderate (>.41) Cohen's kappa [20] or an intraclass-correlation coefficient over .75 [21].

Results
The search yielded in a total of N = 11,354 hits matching the search terms across all eleven databases. After removing all duplicates, a total of 8515 articles remained for screening. Researchers had substantial agreement on title/abstract screening (Cohen's k = 0.66), disagreeing mostly on the use of the capability approach within a paper [22]. This step yielded a total of 101 articles eligible for full-text screening. Researchers had moderate agreement in full-text screening (Cohen's k = 0.44) [20], leading to the exclusion of another 55 articles. Disagreement on inclusion or exclusion was mostly about the level of operationalization of the capability approach in papers, i. e. whether articles actually provided a fullfledged measurement tool or merely a theoretical framework. Five additional articles were identified during hand search, resulting in a total of N = 51 articles included in this review, covering either the development of instruments for measuring capabilities according to the capability approach or psychometric properties. A visual representation of the search is shown in Fig. 1 using the PRISMA-flowchart [17].    Table 2 provides an overview of the different measurement tools reported in the 52 identified articles. We found that instruments to assess capabilities fall into three major categories: (1) qualitative tools, e. g. using interviews or videography (n = 5), (2) quantitative tools, e. g. questionnaires (n = 46), and (3) mixed method approaches using a combination of interviews and questionnaires (n = 1).

Types of measurement instruments
In the quantitative category, n = 5 articles measured capabilities through analyzing secondary data (e. g. data from the British Panel Household Survey [41]), while n = 41 covered a total of eleven individual questionnaires. Of these, four belong to the ICECAP-family (ICEpop CAPability index of the "Investigating Choice Experiments for the Preferences of Older People" project) and use varying sets of items to cover specific target groups and outcome variables: the ICECAP-O for older adults [31] and ICECAP-A for adults [32] with five items each, the ICECAP-SCM measuring capabilities of people in need of supportive care [33] containing seven items, and the ICECAP-FC for adults measuring both functioning and capabilities [34] with ten items. Another set of questionnaires comes from the "Oxford Capability Questionnaire" family, including the original OCAP (Oxford capability Questionnaire) [35] with 64 items, the shortened OCAP-18 [42] (18 items), and a version adapted to mental health, the OxCAP-MH (Oxford Capability Questionnaire for Mental Health) [36] (16 items). The most comprehensive questionnaires are the CQ-CMH (Capability Questionnaire for community mental health) [37] with 104 items and its adapted version, the ACQ-CMH (Achieved Capability Questionnaire for community mental health) [38] with 98 items. The systematic search further identified two questionnaires that did not belong to a larger "family" of tools, the Capability Based Questionnaire for Patients with Chronic Pain [39] (8 items) and the Capability Assessment for Diet and Activity (CADA) geared at adults suffering from obesity and diabetes [40]. All identified questionnaires are constructed for self-completion and use subjective measures to assess capabilities. Table 3 reports on the main aims of the included articles as well as the main methods used to develop the individual measurement tool or to empirically test its measurement properties. Out of the 52 included articles, 8 described the development of a measurement instrument [31-35, 39, 40, 42], 20 focused on checking psychometric properties of existing tools [49, 50, 52, 54-58, 61, 63-71, 74], 2 evaluated different instruments comparatively [28,75], and 8 reported results of actual measurements of health-related capabilities [23,24,27,41,[44][45][46][47]. The remaining (n = 14) articles had a mixed focus: on development/measurement (n = 2) [25,26], development/psychometric properties (n = 9) [29, 30, 36-38, 43, 51, 53, 62], or comparison/psychometric properties (n = 3) [59,60,73].

Main aims and methods employed
Among the qualitative tools, only Sauter et al. [25] provided details on the development process: Their interview guidelines were the result of literature screening and a conscious selection of specific items from the OCAP questionnaire [35]. The identified questionnaires were developed using different methodologies. For example, the OPCAP [35] is based on a set of largely theoretical criteria by Martha Nussbaum, who co-developed the original capability approach [76]. The ICECAP-O questionnaire was compiled based on a previously conducted literature review and developed through indepth interviews with the respective target group [31]. The ICECAP-A [32] and ICECAP-SCM [33], the Capability Based Questionnaire for Patients with Chronic Pain [39], and CADA [40] were developed by conducting iterative interviews with the respective target group. The CQ-CMH [37] emanated from the analysis of focus group data, expert opinion, and an additional alignment with the Nussbaum criteria.
Articles reporting on studies that directly measured capabilities without developing or validating any tools for future use were only found among the qualitative studies and secondary data analyses. Qualitative measurement was performed either by semi-structured interviews [23][24][25][26], observation [24] or video analysis [27], while secondary data was analyzed via methods such as regression [41] or equation modelling [44].
Detailed psychometric properties were only reported for the quantitative measurement instruments. The most detailed results were available for questionnaires of the ICECAP-family. Both the ICECAP-O and the ICECAP-A were reported to have good construct [49,51,52,64,66], convergent [29,53,55,58,60,62,67,70,71] validity when compared to the EQ-5D instrument to measure generic health status, and discriminant validity [29,55,58,67,71]. The ICECAP-O and ICECAP-A further showed good testretest reliability [29,43,57,62,77] and good internal consistency [29,51,58,70]. In addition, the ICECAP-A was also found to be significantly responsive among adults with knee pain [66] and women with irritative lower urinary tract syndrome [67]. No psychometric properties were reported for the ICECAP-SCM, ICECAP-FC questionnaires. In the OCAP family, no details were available for the originally developed questionnaire [35]. The OCAP-18 only yielded good construct validity when correlated with the EQ-5D-3L questionnaire [42]. The adaption of the OCAP for mental health showed good convergent validity [36,74], internal consistency, and test-retest reliability [74], which was also confirmed for its German version [73]. With respect to the other questionnaires, Sacchetto et al. [38] reported good content and discriminant validity as well as internal consistency for the ACQ-CMH. The CADA questionnaire [40] reported good internal consistency for most questions, while the Capability Measurement Tool for People with Chronic Pain [39] did not report any psychometric properties.
Overall capabilities, capabilities for health, and capabilities for PA While some of the questionnaires focus on the overall capabilities to pursue one's goals and being content with one's own life (e. g. the ICECAP questionnaires [31][32][33][34]), others are concerned with more specific aspects, such as enjoying recreational time, political views, making friends, or areas relevant to this study, e. g. bodily health and integrity (e. g. OCAP questionnaires [35,42,74]). Some questionnaires focus on specific subsets of health enhancing factors, such as the CADA [40], which is concerned with capabilities for healthy diet and PA but does not measure overall capabilities for health or well-being. A similar pattern can be found for the qualitative tools: While Ndomoto et al. [24] focus on general capabilities for health, Abu-Zaineh et al. [44] explicitly deal with capabilities for health and self-management diabetes patients. Sauter et al. [25] is the only qualitative tool with a focus on capabilities for PA as a health-promoting factor.
Among the questionnaires, CADA [40] is the only one to directly measure capabilities for PA by specifically asking about resources (e.g. money to afford going to the gym) as well as environmental (e.g. indoor and outdoor PA spaces available), social (e.g. surrounding people are supportive of one's PA) and individual (e.g. mental and physical health influencing PA) factors of influence. The other questionnaires do not specifically ask for capabilities to pursue PA or sports but at least partially address areas that can be considered relevant for health-enhancing PA, such as physical suffering (ICECAP-SCM [33]), bodily health or enjoyment of recreational activities (OCAP [35] and OCAP-18 [42]). The qualitative tools do not explicitly address capabilities for PA. The only exception is Sauter et al. [25], which specifically asks for the individual (e. g. knowledge about PA), social (e. g. family and friends support) and environmental factors (e. g. offerings) that influence the opportunities of seniors in retirement homes to be physically active.

Discussion
The aim of this review has been to give an overview of the current state of research on available tools to measure capabilities for health based on the approach originally developed by Sen and Nussbaum, with a special focus on identifying those potentially relevant for HEPA. The systematic search was able to identify capability measurement tools for health and HEPA using qualitative, quantitative, and mixed methods between 2008 and 2020. It has explored the main features and psychometric properties of the identified tools, as well as their past application to different age and target groups.
Despite the number of papers identified, it is interesting to note that the number of distinct tools reported remains limited. For instance, there is a total of eleven questionnaire-based tools, most of which are variations and adaptations of either the ICECAP or the OPAC questionnaire. It is noteworthy that, although there are variations of the above-mentioned questionnaires for the use among different target groups, there is no tool available to objectively and comprehensively measure all aspects of health-related capabilities, especially when considering that the approach was first published in 1985 [1], connected to well-being as early as 1993 [3], and has recently gained even more attention in the field of public health.
The analysis revealed a great degree of methodological variation regarding the development of the interview guidelines and questionnaires. Some studies approached the development from a more philosophical view and based their interview guideline [25] or questionnaire items [35] on Martha Nussbaum's capability criteria [72]; others used an explorative approach, conducting focus-group [39,40] or key-expert interviews [36,42] to inductively develop their questionnaire. Another research group developed the questionnaire based solely on expert-group's opinion [38]. While our results allow no conclusions about which method is more appropriate or valid, those choosing a tool for a specific health promotion project should consider whether its development method and target group fit the intended application context. The variety of the available tools suggest that measuring capabilities may generally be a rather contextand target group-specific undertaking and may always require adaptation to different contexts and target groups. However, as this impedes the comparability of studies that target capabilities for health, working towards the development of tools applicable to more than one context may seem necessary.
The analyzed questionnaires that were empirically tested showed a moderate to good validity, reliability and responsiveness among different groups and compared to other questionnaires, mostly variations of the EQ-5D well-being questionnaire (i. e. EQ-5D-3L). This approach, however, poses an important theoretical issue, as it seems to imply that capability measures are better if they have a higher degree of correlation to measures of well-being. But according to Sen, well-being is a combination of "achieved functionings" [3], which are linked to but by no means perfectly correlated to a person's options (capabilities). To give an example, a person with a variety of options that may positively influence their health has the freedom to choose their eventual course of action and may actively decide not to realize a specific behavior. If we take the capability approach seriously, we must necessarily expect a considerable mismatch between functionings and capabilities and using this kind of validation approach appears as generally problematic. To validate such a measurement tool, a more comprehensive and thus perhaps more challenging approach might be necessary, e. g. by attempting to account for all individual, structural, and environmental opportunities as well as a target group's resources to validate the instrument.
Another issue is that the number of items used to measure capabilities also varied considerably between questionnaires, i. e. between five items (ICECAP-O/ ICECAP-A) and 104 items (OCAP). This raises the question whether all identified toolseven though they may have been validatedallow for measuring with the same accuracy. More research is required to investigate this, but in any case, health promoters interested in measuring capabilities will have to consider whether it will be feasible to administer the tool of their choice in practice, especially regarding those with a large number of items.
Our findings seem to support the conclusions of a previous literature review by Helter et al. [7] that there remain important conceptual and methodological issues in the field of measuring capabilities. At the same time, our study adds a new perspective, as Helter et al. [7] investigated the use of tools for economic evaluation while our main focus has been on measuring change and health intervention effectiveness.
Our research was guided by the intention to identify suitable tools for measuring capabilities for PA across the life-course. However, only two of the identified measurement instruments explicitly address PA, i. e. the CADA questionnaire [40] and the interview-based tool by Sauter et al. [25]. However, CADA is not geared exclusively at PA but combines it with capabilities for healthy diet. In addition, it was developed for populations suffering from obesity rather than general populations. Similarly, Sauter et al.'s tool has a specific focus on senior citizens. In other questionnaires, only individual items might be considered relevant for PA, e. g. questions on bodily health [35,36,42]. Therefore, they cannot be applied to draw precise conclusions on PA capabilities of people. However, this study is able to provide researchers and health promoters with a number of options for measuring capabilities that may be useful for the field of HEPA by adapting them accordingly.
All in all, our study shows that more research is needed to develop appropriate capability instruments for HEPA. First, these should focus on measuring PA and all its facets, including the individual (e. g. PA-related competence), social (e. g. social support for PA), and environmental (e. g. PA infrastructures and offers) conversion factors. Second, a future measure for capabilities should ideally be applicable to a broader range of different settings, populations, and age-groups, thus allowing for standardized and comparable assessments of PA intervention effectiveness.
As HEPA can be considered a functioning which is intended to be changed by interventions, a combination of measuring both capabilities and functionings (e. g. as done by Al-Janabi [34]) might be advisable in the field. This may help future researchers to identify effects of their interventions on both levels.
We were able to identify very context-specific measurement tools, which seems appropriate due to the context specific nature of the capability approach but is likely to impede the comparability of interventions effectiveness.
To strike a compromise between detailed but settingexclusive tools and overly generic instruments, there might be a need for a framework for conceptualizing and measuring capabilities for health including our aim of health enhancing PA across the life-course, as it was done with the ICECAP measurement tool [79]. Such a framework is currently in preparation, with the intention to define a number of principles that will ensure a greater amount of comparison between age groups and settings while still allowing for the use of adapted instruments in different contexts (Till M, Gelius P, Abu-Omar K, Abel T: Using the capability approach in health promotion projects: a framework for implementation, Under review).
Despite our best of efforts, this study has some limitations which need to be borne in mind when interpreting its results and drawing conclusions. First, due to the heterogeneity of the tools identified, comparing individual instruments with each other was difficult, and it was therefore not possible to recommend a single tool that, in general, could be considered to be particularly appropriate. For the same reason, a more systematic quality assessment of the primary studies, as required by the PRISMA checklist, was not possible. Further, as we only included studies on psychometric properties that came up in our initial systematic search but did not perform a second search for psychometric property measurements for all identified quantitative tools, the results shown in this paper may miss some studies. All in all, however, we are confident that this review provides a good initial overview in an innovative and increasingly relevant area of research. Having been conducted on a large number of databases and employing an additional hand search, it presents details on different types of instruments that may guide the selection of appropriate tools for specific purposes in future research projects.

Conclusion
This systematic review has shown that there is a large variety of measurement tools available which address different aspects of capabilities, target groups or contexts. Until now, there is no golden standard on how to measure capabilities for health and therefore also none for PA. The available tools vary substantially regarding their underlying assumptions, focus on capabilities, properties (e. g. language, number of items), development processes, measurement approaches, and addressees. Most of the quantitative tools have been empirically shown to be valid, reliable and responsive, but the methods employed for validation invite skepticism as to whether all instruments truly measure capabilities and/or do so in a meaningful way. At this point in time, it is not possible to recommend a single tool for general use, and health promoters may want to choose carefully or even consider adapting a tool to their specific needs. Our findings may help inform researchers about available measurement tools that represent different options on how to measure capabilities for health and well-being, and which can be used as references for the future development of a measurement tool for capabilities for health enhancing PA.
Our findings thus seem to echo Sen's own concerns about the empirical difficulties of operationalizing the capability approach [1,80], as well as those of other researches who have demurred that the multidimensional, context-dependent, and normative nature of the approach can pose problems for operationalization [81][82][83].
These difficulties notwithstanding, the Capital4Health consortium, under whose auspices this review was conducted, is planning to contribute to the further development of capability measurement in health promotion and PA intervention research.