Sharing good NEWS across the world: developing comparable scores across 12 countries for the neighborhood environment walkability scale (NEWS)

Background The IPEN (International Physical Activity and Environment Network) Adult project seeks to conduct pooled analyses of associations of perceived neighborhood environment, as measured by the Neighborhood Environment Walkability Scale (NEWS) and its abbreviated version (NEWS-A), with physical activity using data from 12 countries. As IPEN countries used adapted versions of the NEWS/NEWS-A, this paper aimed to develop scoring protocols that maximize cross-country comparability in responses. This information is also highly relevant to non-IPEN studies employing the NEWS/NEWS-A, which is one of the most popular measures of perceived environment globally. Methods The following countries participated in the IPEN Adult study: Australia, Belgium, Brazil, Colombia, Czech Republic, Denmark, Hong Kong, Mexico, New Zealand, Spain, the United Kingdom, and the United States. Participants (N = 14,305) were recruited from neighborhoods varying in walkability and socio-economic status. Countries collected data on the perceived environment using a self- or interviewer-administered version of the NEWS/NEWS-A. Confirmatory Factor Analysis (CFA) was used to derive comparable country-specific measurement models of the NEWS/NEWS-A. The level of correspondence between standard and alternative versions of the NEWS/NEWS-A factor-analyzable subscales was determined by estimating the correlations and mean standardized difference (Cohen’s d) between them using data from countries that had included items from both standard and alternative versions of the subscales. Results Final country-specific measurement models of the NEWS/NEWS-A provided acceptable levels of fit to the data and shared the same factorial structure with six latent factors and two single items. The correspondence between the standard and alternative versions of subscales of Land use mix – access, Infrastructure and safety for walking/cycling, and Aesthetics was high. The Brazilian version of the Traffic safety subscale was highly, while the Australian and Belgian versions were marginally, comparable to the standard version. Single-item versions of the Street connectivity subscale used in Australia and Belgium showed marginally acceptable correspondence to the standard version. Conclusions We have proposed country-specific modifications to the original scoring protocol of the NEWS/NEWS-A that enhance inter-country comparability. These modifications have yielded sufficiently equivalent measurement models of the NEWS/NEWS-A. Some inter-country discrepancies remain. These need to be considered when interpreting findings from different countries.


Background
Most studies of physical activity and built environments have been conducted in the USA, Australia and Western Europe, with recent studies extending findings to Japan [1], Colombia [2], China [3], Brazil [4], and elsewhere [5,6]. Though there have been important consistencies in the results [7], it is not possible to interpret different patterns of association by country because common methods were not employed. Further, the limited variability in environmental exposures and physical activity within countries may have underestimated the strength of association [7]. International evidence about the associations of the built environment with physical activity could inform international and national policies and guide the implementation of international health strategies, such as those from the World Health Organization [8]. Only international studies using comparable methods can establish the extent to which environment and policy associations with physical activity are generalizable across country or are country-specific. Such findings could inform evidencebased international and country-specific interventions to increase physical activity that could help underpin initiatives on the prevention of obesity and other noncommunicable diseases that are high in developed countries and growing rapidly in developing countries [9,10].
One of the main aims of the IPEN (International Physical Activity and Environment Network) Adult project is to conduct multi-country pooled analyses using comparable measures to estimate associations of perceived attributes of the neighborhood environment with physical activity and health-related outcomes across 12 countries [11]. The Neighborhood Environment Walkability Scale (NEWS) [12] and its abbreviated form (NEWS-A) [13] were selected as measures of perceived neighborhood characteristics hypothesized to be related to physical activity, especially walking (e.g., land use mix, street connectivity, and traffic safety). The NEWS and NEWS-A have been adapted and/or translated for use in various countries [14][15][16][17][18][19][20][21]. Both the original and adapted/translated versions have shown acceptable test-retest reliability, concurrent validity with respect to objective environmental measures, and some evidence of criterion validity with respect to physical activity outcomes [12,[15][16][17][18][21][22][23][24].
To date, four studies have established measurement models of factor analyzable items of the original [13,25] and adapted versions of the NEWS and/or NEWS-A [16,17] based on Confirmatory Factor Analyses (CFA). The CFA models describe the patterns of associations between items and their underlying latent constructs (e.g., street connectivity or aesthetics), thereby providing recommendations on how to summarize and score participants' responses on the subscales [13,25]. Specifically, CFA can evaluate the extent to which responses to questionnaire items (e.g., perceived high crime rate, feeling unsafe to walk in the neighborhood during the day, or at night) that are hypothesized to measure the same construct, aka latent factor (e.g., safety from crime), share common variance. For each questionnaire item, CFA yields standardized factor loadings that indicate the magnitude and direction of associations between the responses on the items and their underlying latent construct. For example, a standardized factor loading of −0.85 for the item "perceived high crime rate" on the latent factor of crime safety would indicate that its responses are strongly negatively correlated with the factor.
All four studies that conducted a CFA of the NEWS/ NEWS-A used a two-stage cluster sampling strategy to recruit participants from selected areas (i.e., selected study areas from which participants were recruited), they distinguished individual-from area-level measurement modelsthe former based on within-area differences and the latter on between-area differences in responses to the individual items. Because individual-level measurement models are more reflective of how perceptions of environmental characteristics group into factors and, thus, are likely more generalizable across populations than their area-level counterparts, it has been suggested that the NEWS and NEWS-A be scored according to the individual-level models [13,16,17,25]. The full and abbreviated versions of these instruments showed similar individual-level measurement models including the following multi-item latent factors: land use mix -access to services; street connectivity; infrastructure for walking/cycling; aesthetics; traffic safety; and safety from crime. Yet, several between-study discrepancies were noted, which might be attributed to differences in items or somewhat limited generalizability of specific measurement models to other geographical or cultural settings [25].
A prerequisite for conducting pooled analyses of multi-country data is the use of common protocols, including comparable exposure and outcome measures. In the case of the IPEN Adult project, a requirement for a country's inclusion was measurement of perceived neighborhood attributes using the NEWS or NEWS-A, representing one of the main exposure measures. However, IPEN Adult was not a multi-center study that was funded at the outset with all countries required to follow an exact protocol. To optimize resources, some IPEN countries were able to receive local funding and proceed with their study before a funded coordinating center was in place to implement tight quality control. This funding model enabled more countries to contribute data, strengthened the study, and allowed countries some level of flexibility in matching the protocol to the local context, thereby making the study more relevant to their national situation. However, the downside was lack of comparability in some study elements, including the set of NEWS items used. Hence, the aim of the present paper was to compare subsets of comparable NEWS/ NEWS-A items used across the 12 IPEN countries and, based on empirical evidence on their CFA-derived individual-level measurement models, propose scoring protocols that maximize cross-country comparability of responses. CFA-based NEWS/NEWS-A scores with demonstrated comparability across countries could then be used for either pooled analyses combining countries, or for study-specific non-pooled analyses, thereby facilitating both cross-study and cross-country comparisons. This information is not only important to studies included in the IPEN initiative, it is also highly relevant to other researchers who used or will use the NEWS/ NEWS-A, which are currently the most popular measures of perceived neighborhood environment worldwide. Additionally, this paper also proposes a relatively simple analytical approach that can be used to create comparable measures for multi-country pooled analyses, when some deviations in the measurement protocol exist across study sites.

Neighborhood selection
The IPEN Adult study is an observational epidemiologic multi-country cross-sectional study. Twelve countries participated: Australia, Belgium, Brazil, Colombia, Czech Republic, Denmark, Hong Kong, Mexico, New Zealand, Spain, the United Kingdom, and the United States. Study participants were selected from neighborhoods chosen to maximize the variance in neighborhood walkability and Socio-Economic Status (SES); this occurred in all countries except Spain, where neighborhood SES was not available. The goal of the study design was to have equal numbers of neighborhoods stratified as follows: high walkable/high SES, high walkable/low SES, low walkable/high SES, and low walkable/low SES. For selection of study neighborhoods, all countries except Spain used a neighborhood walkability index that was objectively defined using Geographic Information Systems data at the smallest administrative unit available. A neighborhood walkability index for the whole area of study was first developed [26]. Then, neighborhoods with relatively lower and higher walkability index scores by lower and higher SES indicators were selected. In nine countries, participants were recruited across the seasons to control for variations in weather that may affect physical activity. In six countries, participants were recruited equally across the neighborhoods by season. The details for each country can be found elsewhere [11].

Recruitment and participants
The required recruitment strategy was systematic selection of participants with addresses in the chosen neighborhoods. Adults living in the selected neighborhoods were contacted and invited to complete surveys on their physical activity and perceptions of the environment. Study dates ranged from 2002 to 2011. Each country obtained ethical approval from their local institutions and all participants provided informed consent. Age ranges for recruitment ranged from 15-84 years. Four countries recruited participants by phone and mail, and eight of the studies contacted households in person. Databases of resident addresses from commercial and government sources were used for the phone and mail recruitment. For the in-person recruitment, standard procedures for identifying households and participants within a household were employed [27]. In Hong Kong, intercept interviews were conducted in residential areas where individual addresses were not available, for example, in large apartment buildings with restricted access. Six countries used monetary incentives, and four countries provided non-monetary incentives including feedback on physical activity [28]. Six countries employed self-report methods (mail and online surveys) to collect survey data, four countries used interviews, and two countries used both selfreport and interview methods. Further details for the participant recruitment techniques and response rates across countries can be found elsewhere [11].
There was a total N = 14,309 participants, ranging from 512 -2,650 individuals from each country (see Table 1). The mean participants' age was 42.3 (SD = 12.9) years. Overall, 57.1% were women, 38% had a high school degree, 43.9% a college degree, and 59.6% were married or living with a partner. Demographic descriptive statistics for each country are also shown in Table 1.

Versions of the neighborhood environment walkability scale
General overview The full and abbreviated versions of the NEWS and NEWS-A comprise 67 and 54 items, respectively [12,13]. They gauge the following perceived neighborhood attributes: (1) residential density; (2) land use mixdiversity; (3) land use mixaccess; (4) street connectivity; (5) infrastructure and safety for walking; (6) aesthetics; (7) traffic safety; (8) safety from crime; (9) streets not having many cul-de-sacs; (10) physical barriers to walking; (11) parking difficult in local shopping areas; and (12) hilly streets in the neighborhood. The United States employed the original full NEWS; New Zealand used the NEWS-A; while the remaining 10 countries used various combinations of NEWS/NEWS-A items, in their original or slightly modified forms. All countries included at least some items gauging the first 10 neighborhood attributes listed above. All non-English versions of the instrument were forward-translated from English into the local language, culturally adapted (when needed), and back-translated into English. At least two expert raters reviewed all versions of the NEWS/NEWS-A and evaluated item content equivalence.

Subscales (original and adapted)
The Residential density subscale of the original NEWS and NEWS-A consists of six items rated on a 5-point scale (1 = none; 2 = a few; 3 = some; 4 = most; 5 = all) ( Table 2). Eleven out of 12 countries used the original response scale, while Belgium used a 3-point scale (1 = none; 2 = some; 3 = many). For the purpose of this study, responses on this subscale were recoded to range from 0 to 4 (0 to 3 for Belgium; i.e., 0 = none; 2 = some; 3 = most). This was done to enhance the accuracy of the measure so that perceived absence of a specific type of density-related attribute (e.g., apartments or condos with 4-6 stories) would not positively contribute to the total residential density score. Six countries used all six original items. The remaining countries reduced the number of items by merging the content of adjacent items or by omitting those gauging the highest levels of residential density. Finally, Hong Kong added an item to account for extreme levels of residential density (high-rise buildings with more than 20 stories) ( Table 2). Ratings on the original six items of this subscale are weighted relative to the average residential densities that they represent, these being 1, 12, 10, 25, 50, and 75, respectively [12]. Table 2 describes modifications to the scoring procedures of the original version of this subscale that will be adopted to make it comparable across IPEN countries (see column Proposed solutions). A summary residential density score is obtained by summing up all weighted items' scores.
The Land use mixdiversity subscale of the original NEWS and NEWS-A is assessed by the perceived

Residential density
Original scale ranging from 1 to 5, with 1 denoting ' none'. This yields positive density scores even in absence of residential buildings and sometimes similar density scores for environments with different densities.
Recode original response scale ranging from 1 to 5 to 0 = none; 1 = a few; 2 = some; 3 = most; 4 = all Belgium used a 3-point (none; some; many) rather than 5-point scale (none; a few; some; most; all) on all subscale items.
The Belgian response "some" will be given a value of 2 (as in the recoded 5-point scale), while "many" will be assigned a value of 3, corresponding to "most" on the recoded 5-point scale.

Detached single-family residences
None NA

Townhouses or rows of 1-3 stories houses
Hong Kong combined this item with item 3. This item and item 3 have similar density weights (12 and 10). The Hong Kong item will be given a mid-weight of 11. For all other sites, with the exception of the UK which does not have item 3, the item with the highest rating (between item 2 and 3) will be chosen to be included in the summary residential density score and given a weight of 11. If items 2 and 3 have equal ratings, one of them will be included in the calculations and given a weight of 11.
The UK will have this item weighted by 11.

Apartments or condos with 1-3 stories
Hong Kong combined this item with item 2. See comments for item 2.
The UK combined this item with item 4. For the UK site, see comment for item 4 below.

Apartments of condos with 4-6 stories
The UK combined this item with item 3. Items 3 and 4 have very different weights (12 and 25, respectively). We will weigh this UK item by the mid-point of the two original weights (18.5), with the assumption that the two types of apartments have similar prevalence.

Apartments or condos with 7-12 stories
None NA 6. Apartments or condos with >12 stories Australia, Belgium, Denmark, the Czech Republic, and the UK did not include it because high-rise residential buildings were judged to have very low prevalence. Hong Kong modified this item to read "Apartments or condos with 12-20 stories".
No action is required as such buildings have low prevalence in the study sites that omitted this item. The mean score on this item would have been close to 0 and, thus, not adding to the overall density score. Hong Kong modified this item to distinguish residential buildings with 12-20 stories from those with 20-50 stories, which are common in Hong Kong but not in other study sites.

Apartments or condos with >20 stories
Hong Kong added this item to the subscale. This item will be given a weight of 100.
Land use mixaccess

Stores within easy walking distance
None NA

Many places within walking distance
Belgium and the UK did not include this item. For these two countries, the CFA and summary score on this subscale will be based on 2 rather than 3 items. Using data from other countries, ascertain the correspondence between scores based on the 2-and 3-item subscales.

Easy to walk to a transit stop
None NA

Short distance between intersections
Belgium did not include this item but had an additional item "four-way intersections" that is part of the NEWS street connectivity subscale Include the additional connectivity item in the Belgian CFA model and, using data from countries that had all three street connectivity items, ascertain the comparability of the two 2-item street connectivity scores and a connectivity score based on an item common to Belgium and other countries (item 2).
2. Many alternative routes Australia did not include this item but had an additional item "four-way intersections" that is part of the NEWS street connectivity subscale Include the additional connectivity item in the Australian CFA model and, using data from countries that had all three street connectivity items, ascertain the comparability of the two 2-item street connectivity scores and a connectivity score based on an item common to Australia and other countries (item 1) Infrastructure and safety for walking and cycling 1. Sidewalks None NA

Cars separating sidewalks and traffic
Hong Kong did not include this item. Belgium combined it with item 3.
For these countries, the CFAs and summary score on this subscale will be based on a smaller number of items. Using data from other countries, ascertain the correspondence between scores on the subscale based on different combinations of items.
walking proximity from home to 23 different types of destinations, with responses ranging from 1-5 minute walking distance (coded as 5, indicative of high walkability) to >30-min walking distance (coded as 1, indicative of low walkability). Given that the lists of destinations varied substantially across countries due to cultural and geographical idiosyncrasies, individual destinations were collapsed into nine destination categories common to all countries and 13 destination categories common to 11 out of 12 countries (the UK had nine of these categories) . The nine categories common to all countries were: supermarket, small grocery or similar stores, post office, any school, transit stop, any restaurant, park, gym or fitness facility, and other stores and services. The 13 categories common to 11 out of 12 countries also included: library, video store, drug store/pharmacy and bookstore. Summary scores of land use mixdiversity are obtained by averaging rating across the nine or 13 categories of destinations to give a 9-destination and a 13-destination average score, respectively. The remaining seven perceived environmental attributes assessed by all IPEN Adult study sites were rated using the original 4-point Likert scale (1 = strongly disagree; 4 = strongly agree). Summary scores for these subscales are computed by averaging the scores on the corresponding items (reverse scored, when necessary in the direction consistent with higher walkability and safety). All countries included the single-item subscales Include the additional item in the Australian CFA model and, using data from countries that had this additional item, ascertain the comparability of the Australian 4-item version of the subscale with the original 4-item version.

Many interesting things to look at
Belgium and the UK did not include this item. For these two countries, the CFA and summary score on this subscale will be based on 3 rather than 4 items. Using data from other sites, ascertain the correspondence between scores based on the 3-and 4-item subscales.

Many attractive natural sights
None NA

Attractive buildings/homes
None NA Traffic safety/hazards

Heavy traffic along nearby streets
Brazil and the Czech Republic included a slightly different item from the NEWS: "heavy traffic along the street" Use the slightly different item as a substitute for item 1. Using data from other countries that included all relevant items, ascertain the correspondence between scores based on the common subscale and these two countries' version of the subscale.

Slow traffic speed on nearby streets
Australia, Belgium, and the Czech republic included a slightly different item from the NEWS: "slow traffic speed on the street" Use the slightly different item as a substitute for item 2. Using data from other sites that included all relevant items, ascertain the correspondence between scores based on the common subscale and these three sites' version of the subscale.

Speeding drivers
Australia did not include this item. The Australian CFA and summary score on this subscale will be based on 2 rather than 3 items. Using data from other countries, ascertain the correspondence between scores based on the 2-and 3-item subscales. of Streets not having many cul-de-sacs and Physical barriers to walking (e.g., canyons, railways, freeways), and the 3-item Safety from crime subscale of the NEWS-A. All countries, with the exception of Belgium and the UK, used all items of the Land use mixaccess subscale of the NEWS-A ( Table 2). The 2-item Street connectivity subscale of the NEWS-A was included in all studies except for Australia and Belgium, these having one common and an additional item. Nine countries included the complete Aesthetics and eight of them completed the full Infrastructure and safety for walking/cycling subscales of the NEWS-A. The full NEWS-A Traffic safety subscale was included in eight countries ( Table 2). A few countries included one or two Aesthetics or Traffic safety items that, albeit not identical, were potentially suitable substitutes for the original NEWS-A items ( Table 2).

Socio-demographic characteristics
For the purpose of this paper, the following self-reported socio-demographic characteristics were considered: gender, age, educational attainment, and marital status.

Data analyses
Site-specific measurement models of the NEWS/NEWS-A Individual-level, site-specific measurement models of the NEWS/NEWS-A were derived by conducting separate CFAs for each country on the responses to factoranalyzable items (all items except for those measuring Residential density and Land use mixdiversity). Arealevel clustering effects arising from the two-stage sampling procedures used in all studies were accounted for by conducting CFAs on within-area variance/covariance matrices quantifying estimates of individual-level relationships between the items [17]. CFAs were based on the Maximum Likelihood Estimation method. A priori individual-level site-specific measurement models of the NEWS-A were formulated taking into consideration the available items across countries, their comparability (see Table 2), and findings of previous CFAs of the NEWS and NEWS-A [13,16,17,25]. Measurement models including the following factors and single items were estimated: 4. Aesthetics: four common items for nine countries; three common items for Belgium and the UK; three common and an alternative item for Australia (here, 'alternative' means not included in all countries) 5. Traffic safety: three common items for eight countries; two common and an alternative item for Brazil; one common and two alternative items for Australia; and another combination of one common and two alternative items for Belgium and the Czech Republic 6. Safety from crime: three common items for all countries 7. Not many cul-de-sacs: a single common item for all countries 8. Physical barriers to walking: a single common item for all countries.
These eight factors were assumed to be inter-correlated. Jöreskog and Sörbom's [29] iterative model-generating approach was used to re-specify the models and was guided by an inspection of standardized factor loadings, standardized residual covariances, univariate Langrage multiplier tests, Wald tests, multivariate outliers, and theoretical considerations [29]. The goodness-of-fit of the measurement models was assessed using a combination of modelfit indices recommended by Hu and Bentler [30] and Kline [31], including the Comparative Fit Index (CFI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Squared residual (SRMS). According to Hu and Bentler [30], values supportive of good model fit are ≥0.95 for CFI, ≤0.06 for RMSEA, and ≤0.08 for SRMR. Given that the CFI is sensitive to the magnitude of correlations between variables [31] and cooccurring environmental attributes sometimes show only modest associations [17,32], we treated CFI values ≥0.90 as indicative of acceptable levels of model fit if the other two fit indices met Hu and Bentler's [30] stricter criteria. We also reported the χ 2 test. EQS 6.2 [33] was used to conduct CFAs.

Comparability of various versions of NEWS-A subscales
The level of overlap between corresponding standard and alternative versions of the NEWS/NEWS-A factoranalyzable subscales was determined by assessing the strength of associations (Pearson correlation coefficient) and mean effect size difference (in the form of Cohen's d) between them. This could be done using data from countries that had included items from both standard and alternative versions of the subscales. A correlation coefficient ≥0.80 (indicative of collinearity) [34] and an absolute Cohen's d <0.25 (indicative of very small effect sizes) [35] were considered supportive of significant conceptual overlap between the versions of the subscales.

Country-specific measurement models of the NEWS/ NEWS-A
After deletion of up to six multivariate outliers per country, the a priori individual-level measurement models of the NEWS/NEWS-A showed acceptable fit to the data of seven out of 12 countries (Table 3). The CFI values for the remaining countries did not meet the set criterion (≥.90). An inspection of the standardized factor loadings, standardized residuals, and Wald tests revealed that in the case of the Czech Republic, Mexico, Spain, Denmark, and the United Kingdom, two items (Cars separating sidewalks and traffic; Grass/dirt separating sidewalks and traffic) did not significantly load, or loaded in the opposite direction than expected, on the latent factor they were supposed to measure (Infrastructure and safety for walking/cycling). These two items also showed lower than desirable standard loadings (<|.30|) [36] for data from Australia, Colombia, and New Zealand. It was, thus, decided to omit them from all measurement models in order to ensure cross-country comparability. Apart from excluding the two problematic items, all models were re-specified by allowing item error terms to be correlated (where appropriate) and constraining inter-factor correlations to zero where the data did not provide sufficient support for an association. All re-specified models fitted the data sufficiently well, with five measurement models also fully satisfying Hu and Bentler's [30] stricter goodness-of-fit criteria ( Table 3). All standard factor loadings were significant at a probability level of < .001 in the expected direction (Table 4), with nearly all of them exceeding an absolute value of 0.30 indicating a substantial relationship [36]. All measurement models shared the same structure with six latent factors and two single items ( Table 4). The average inter-factor correlations were low. However, across the various models, the Street connectivity latent factor was moderately or highly correlated with Land use mixaccess and Infrastructure and safety for walking/cycling ( Table 4).

Comparability of versions of the NEWS/NEWS-A subscale
Due to differences in items and, hence, measurement models (Table 4), the NEWS/NEWS-A subscales of certain countries departed from the standard versions (Table 5). Comparability analyses showed that there was a high level of correspondence between the standard and alternative versions of the following subscales: Land use mix access, Infrastructure and safety for walking/cycling, and Aesthetics. Using data from countries that had data on all relevant items, strong associations and small differences in means were found between scores on these three standard and alternative subscales. The alternative version of the Traffic safety subscale used by Brazil was also highly comparable to the standard version. The level of comparability of the remaining two versions of the Traffic safety subscale (for Australia and for Belgium and Czech Republic) was marginally acceptable, with average correlations and/or Cohen's d at the limits of the acceptable range of values. The multi-item alternative versions of the Street connectivity subscale for Australia and Belgium did not provide a good match to their standard counterpart. However, single-item versions showed higher, yet marginally acceptable, levels of correspondence (Table 5).

Discussion
The IPEN Adult project aims to conduct pooled analyses of associations of perceived environment with physical activity and health outcomes using data from 12 countries differing in built and social environments [11]. The significant between-country differences in socio-demographic, cultural, and environmental factors required cultural adaptations to the NEWS [12] or NEWS-A [13], the perceived neighborhood environment measures adopted by the IPEN. To assist pooled analyses, it was important to establish scoring protocols that maximize the amount of usable data and yield  comparable NEWS/NEWS-A summary scores across countries. These protocols are also relevant to future studies using the NEWS/NEWS-A since they facilitate cross-study comparison and, hence, can contribute to a better understanding of environment-physical activity relationships. Several between-country differences were observed on all subscales of the NEWS/NEWS-A with the exception of Safety from crime and two single-item subscales. As detailed earlier, differences in the Residential Density and Land use mixdiversity subscales were resolved by modifying the scoring protocols of their original versions so to maximize inter-country comparability. For the remaining factor-analyzable items, well-fitting, comparable, individual-level measurement models consisting of eight distinct constructs (Table 4) were derived after omitting two problematic items (Cars separating sidewalks and traffic; Grass/dirt separating sidewalks and traffic). These findings provide further support for the robustness and generalizability of the factorial structure of NEWS and its abbreviated form, NEWS-A.
Interestingly, the two poor-fitting items mentioned above had some of the lowest standard factor loadings in the measurement models of the original NEWS and NEWS-A [13,25] and the Australian version of the NEWS [16]. In the present analyses, one or both items did not substantially load on any latent factor of the measurement models for Australia, Colombia, Denmark, Mexico, Spain, and the UK. Additionally, for the Czech Republic and Mexico, the item "Cars separating sidewalks and traffic" showed an unexpected negative loading on the factor it was supposed to measure (Infrastructure and safety for walking/cycling). Indeed, the presence of parked cars along sidewalks may in some cases be indicative of higher volumes of vehicular traffic and, hence, reflective of traffic hazards rather than safety. This conjecture was confirmed by field observations by the two respective research teams. Given the ambiguous interpretation of these two items across IPEN countries, it was decided to exclude them from all country-specific measurement models. Thus, we do not recommend they be used in the IPEN pooled analyses, though they can still be relevant in within-country analyses.
An issue worth mentioning pertains to the strong correlations observed between Street connectivity and the factors of Land use mixaccess and Infrastructure and safety for walking/cycling, exceeding 50% of shared variance in four countries. This raises multi-collinearity concerns for multi-predictor regression models including all three subscales as explanatory variables, at least with respect to the countries where this appears to be as significant problem (Brazil, Colombia, Hong Kong, and the UK). Yet, these correlations represent associations between latent factors rather than subscale scores, the latter being used in regression analyses. Latent factor scores include only items' communalities (the items' variances that are accounted for by a factor), while subscale scores representing the average rating on the relevant items also include items' uniquenesses, which are in part random error variance [36]. Thus, correlations between latent factors are usually higher than those between the corresponding 'raw' subscale scores. Post-hoc analyses revealed that this was also the case in this study. Specifically, the correlations between the scores on the Street connectivity and the other two potentially collinear subscales ranged from 0.23 to 0.41. Hence, the simultaneous inclusion of these three subscales as explanatory variables in regression models should not create multicollinearity problems.
As noted earlier, six out of 12 countries did not use standard versions of the NEWS/NEWS-A subscales. An analysis of the level of comparability of the various versions of the subscales revealed some potential concerns with the Australian and Belgian versions of (single-item) Street connectivity and Traffic safety, and the Czech version of Traffic safety. Specifically, the average correlations between the standard and alternative versions of the subscales only just met the adopted comparability criterion (≥0.80) and substantial variability was observed in the individual country-specific correlations. This has some implications for the interpretation of results from pooled analyses with respect to these subscales. Namely, eventual between-country (Australia, Belgium, and the Czech Republic vs. other countries) differences in associations of these two perceived environmental attributes with outcome measures (detectable in the form of significant country by perceived attribute interaction effects) may be in part due to measurement differences. Hence, caution will be needed in interpreting such significant interaction effects (if any), especially if they are small in magnitude.
Also of concern are the observed differences in mean scores between the standard and two alternative versions of the Traffic safety subscale. Although the alternative subscales had acceptable or marginally acceptable average effect sizes, some country-specific effect sizes were too large, suggesting the possibility that differences in scores between the standard and alternative subscales for Australia, Belgium, and the Czech Republic (if data on all relevant items had been available from these countries) might be substantial. Thus, pooled analyses of between-site differences in average perceived neighborhood environmental attributes will need to take into account the possibility that the presence or lack of differences in perceived traffic safety between these three sites and the remaining sites be due to differences in measures.
Given the above, we recommend that for the purpose of conducting pooled analyses on data from the 12 IPEN countries, the factor-analyzable NEWS/NEWS-A subscales be scored according to the respective countryspecific measurement models presented in Table 4, with the exception of the Australian and Belgian versions of the Street connectivity subscale which, for these two countries, should consist of a single item (see algorithms presented in Table 6). The Residential density and Land use mixdiversity subscales should be scored according to the algorithms shown in Table 6, which are based on previously presented analyses and remarks (see Table 2 and Methods section). We also recommend that future studies employing country-specific versions of the NEWS/NEWS-A use the here-proposed protocols.

Limitations
The main limitations of this study pertain to differences in the participant recruitment procedures, survey administration mode, and the use of somewhat different versions of the NEWS/NEWS-A across countries. However, differences in the ways participants were recruited are unlikely to have had a systematic impact on the interitem associations and, thus, measurement models of the    Table 4.
NEWS/NEWS-A. Additionally, it is important to note that nearly all countries used the same stratified twostage sampling procedure to recruit participants from neighborhoods selected on the bases of their socioeconomic and walkability levels. This likely contributed to the recruitment of relatively comparable samples across countries in terms of educational attainment and exposure to low-vs. high-walkable environments, two characteristics that may impact the accuracy and variability of responses to measures of perceived neighborhood environment [15,37,38]. Self-administered surveys may have resulted in less socially desirable and more consistent data [39]. Yet, differences in survey findings across modes of administrations are generally small and more pronounced when examining sensitive topics [39]. However, the items included in the NEWS/NEWS-A (e. g., access to services and traffic safety) are unlikely to be perceived as delicate issues. The fact that the 12 IPEN countries did not use exactly the same group of survey items precluded the conduct of a more rigorous assessment of the cross-country equivalence of the NEWS/NEWS-A, whereby measurement model fit would be assessed by progressively increasing the number of constrained parameters (i.e., parameters would be constrained to be equal across IPEN countries), starting from factor loadings and finishing with the variances of errors terms [40]. With the available data, we were only able to demonstrate cross-country configural equivalence of the NEWS/NEWS-A, i.e., that all measurement models consisted of the same latent factors. This is an essential requirement for the conduct of pooled analyses based on NEWS/NEWS-A data.

Conclusions
To improve inter-country comparability , and allow pooled analyses of data, in investigations of associations of perceived neighborhood environment with physical activity and health outcomes, we have proposed modifications to the original scoring protocol of the NEWS/ NEWS-A and have established country-specific, comparable measurement models to be employed in future analyses. A few potential inter-country discrepancies remain with respect to the measurement of street connectivity and traffic safety, which need to be considered in the interpretation of findings based on pooled analyses and comparison of findings from different countries. We recommend that future studies using the NEWS/NEWS-A implement the here proposed scoring protocol to facilitate cross-study comparability and interpretation of the findings. Importantly, the analytical approach presented in this paper could also be used by other multi-site projects with variations in the measurement protocol and requiring optimization of data inclusion and comparability for pooled analyses and cross-site comparisons.
Abbreviations CFA: Confirmatory factor analysis; NEWS: Neighborhood environment walkability scale; NEWS-A: Neighborhood environment walkability scale-abbreviated version; IPEN: International physical activity and environment network; SES: Socio-economic status; SD: Standard deviation; CFI: Comparative fit index; RMSEA: Root mean square error of approximation; SRMR: Standardized root mean squared residual.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions EC conceptualized the study and modifications to the scoring protocols, devised and conducted the confirmatory factor and comparability analyses, drafted the Results and Discussion sections and part of the Introduction and Methods sections; and is one of the PIs of the Hong Kong study. TLC is a co-investigator on the US study and Coordinating Center grant, had primary oversight for data management and processing of data from countries at the San Diego Coordinating Center, constructed the datasets used for analyses, and assisted in drafting the manuscript. KLC is the project manager for the US study and Coordinating Center grant and coordinated crosscountry data collection efforts and data processing at the San Diego Coordinating Center. JK is a co-founder of the IPEN group and a coinvestigator on the Coordinating Center grant and assisted in drafting the manuscript. IDB is co-founder of the IPEN group and PI of the Belgian study. NO co-founded the IPEN group and is PI of the Australian study. RSR coordinated and is PI of the study Brazilian study. OLS coordinated and is PI of the study in Bogotá (Colombia). EAH is co-investigator of the New Zealand arm of the IPEN study. DS coordinated and is co-investigator of the Mexican arm of the IPEN study. LBC coordinated and is co-investigator of the Danish study. DJM is the project manager and one of the PIs for the Hong Kong study. RD coordinated and is PI of the UK study. JM is coinvestigator of the study conducted in the Czech Republic. IAO coordinated and is co-investigator of the study Spanish study. JFS co-founded the IPEN group, is PI of the US study and Coordinating Center grant. All authors reviewed and edited the manuscript, and approved the final manuscript as submitted.