A systematic coding scheme was developed for the MAPS streetscape observation tool that summarized 160 items into 26 specific subscales, 10 valence scores, and one final overall score for each of the MAPS sections: route, crossings, segments, and cul-de-sacs. A majority of items (75.6%) and all but one of the subscales (96.1%) had moderate or good/excellent reliability. The most common items with lower agreement were those assessing slope, subjective qualities (e.g., the social environment/disorder), and those with complex response options (e.g., percentage of sidewalk shaded by trees, number of façade types/colors). When necessary, reconceptualization, recoding, and editing were carried out for under-performing items and scale scores. This scoring and scale development approach allows examination of microscale attributes at varying and tiered levels of specificity and complexity. The guiding assumption was that the aggregate environmental impact is likely to be most influential on physical activity. It is possible that critical positive or negative single factors can override an aggregate score, and the present scoring system allows examination of detailed items, specific conceptual categories, or aggregate scores. Thus, all coding and scoring decisions were made considering expected effects on physical activity, but the scales may be relevant for other outcomes, such as social interactions, social capital, perceptions of environments, and neighborhood satisfaction as well.
Numerous microscale tools have been developed
, reflecting a recognition that the details of streetscapes are likely to affect experience and behavior beyond macro-level variables (e.g., street connectivity) that are studied much more often. However, the lack of well-developed and accessible scoring systems, along with the expense of data collection, seems to have inhibited the use of microscale tools, because there are few published reports using microscale data
[9, 27, 34, 35]. The complexity of the measures also appears to have been a deterrent to developing scoring systems because of differing response scales, item formats, and conceptual domains. Many of the scoring systems that have been developed have not been tested for inter-rater reliability or reliability has not been reported in the literature
[14–16, 19, 21, 36]. Thus, the present scoring system represents an advance, and the principles used may be applicable to other microscale tools.
Though all published microscale measures assess a wide range of attributes, MAPS may be more comprehensive because it assesses and provides scores for routes, crossings, segments, and cul-de-sacs. Present results are expected to be generalizable, at least to metropolitan areas in the United States, because data came from three geographic regions (Baltimore, MD/Washington, DC, San Diego, CA, and Seattle, WA) and included neighborhoods representing a wide range of urban design and income. Though MAPS has not been used in rural areas, it could be applicable to such settings as well, although further development of micro-scale evaluation in such areas is needed.
The innovative approach of evaluating routes beginning with each study participant's home was adopted for two major reasons. First, the routes were highly personalized and designed to reflect the environment each participant would likely encounter while walking or bicycling to a nearby destination. Second, the route approach was efficient and controlled the time cost of assessing each person’s local environment. Selecting routes was appropriate for the parent studies because the participants’ locations were known. Other microscale tools were designed to audit a sample of street segments, and then create summary scores that can be related to individual behavior
[9, 19, 20, 33]. However, there are major questions about how many segments or what percent of neighborhood streets need to be observed to represent an adequate sample
. Observed streets may or may not be representative of the environment experienced by a given participant. The larger the areas over which environmental characteristics are averaged, the less valid they are likely to be for each participant. Thus, observation of individual routes has important procedural advantages for some types of studies and is likely to produce more valid estimates of effect sizes in relation to physical activity and other outcomes.
Another advantage of the scoring system outlined in the present paper is its generalizability. The principles used to score MAPS may be applicable to other audit tools. The principles included the combination of conceptual- and policy-driven scoring systems, items scored as dichotomies or trichotomies, hypothesis-driven classification into positive or negative subscales for expected effects on physical activity, and the inclusion of a hierarchy of scores from construct-specific to aggregated. The likely utility of this approach includes the benefit that specific construct measures may be useful for guiding city planning and transportation decisions. It also may help residents understand the strengths and weaknesses of their micro-scale (street and pedestrian) environments, including assessing local disparities in environmental quality. The overall scores and aggregated scores may be better suited for explaining outcomes because cumulative effects across numerous attributes are more likely than individual items to influence behavioral outcomes and neighborhood perceptions
The MAPS scoring and reliability process included several limitations. The decision-making processes for item retention and scale creation was driven by theory
[28, 38, 39], research team discussion, statistical criteria, policy-relevance, and expert consensus. While these strategies included accepted methods for such decision-making, there were ambiguous cases and situations. The originally proposed conceptually-driven process had to be modified throughout the course of data reduction. While we present one method of combining items and subscales, subtracting negatives from positives may not reflect the complexity of various microscale elements. Ultimately, scales were created with the aforementioned blend of theory, practicality, and expert consensus in hopes that this method may prove useful to wider audiences and researchers. The present scoring system may not be optimal, and other approaches to combining items and examining patterns of attributes should be explored.
A limitation of the design is that the selected route may not be a direction usually taken by the participant, or a participant may take multiple routes to the destination(s) of interest. However, the route selection procedure was designed to identify a likely route in the direction of the nearest destination. The most important environmental attributes (e.g., a high speed roadway) may have been just beyond the ¼ -mile designated route. This issue can be better addressed in forthcoming validity analyses.
Quantifying the diversity of built environment variables is inherently complicated, and rarely does one answer fit all situations. Given that the focus of the present scoring process was on influences relevant to physical activity and the desire to link variables with planning and transportation policies, some of the decisions may have created scales that do not apply to all potential future uses. For example, present scoring may not be generalizable if scoring neighborhoods for relevance to social capital or in very different contexts such as Asian or African cities. Further, the three regions sampled herein likely do not represent the entire range of environments that MAPS may be used in the future. During analysis, many of the dropped items were determined to contain ambiguous wording or concepts. Future investigators can revise or exclude items in a modified MAPS but should carefully describe changes and additions, including validation of any changes.
Typical methods of scale development include factor analysis and assessment of internal consistency (i.e., Cronbach’s alpha), but they were not used in the present study. These approaches are appropriate for psychological measures but may not be for environmental assessment. Features that are conceptually related (e.g. street lights and speed limit for safety) may not co-exist, so internal consistency is not a sufficient index of scale quality. The present goal was to create conceptually coherent and policy-relevant scales. By identifying clusters of environmental attributes under the jurisdictions of planning, transportation, and public works departments, results can be more relevant for establishing accountability for making environmental improvements that would be expected to promote physical activity.
The next analytic step is to conduct validity analyses that will relate these scale scores to total physical activity and specific walking outcomes (e.g., walking to school, for leisure, for transport) in multiple age groups. Another next step would be to develop an electronic-based interface for integrated data collection and scoring (e.g., iPad app). Following validation, these scales can be used for research and practice purposes. As tools like MAPS are used in different environments and countries, and by investigators interested in different outcomes, the tool itself is likely to be modified. It is difficult to imagine how a standard version could be developed that could be used for community intervention and policy advocacy purposes and be comparable across locations. As one step toward more frequent use of streetscape audits, validity analyses could result in a shorter form containing the most valid items and scales.