Development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS)
© Millstein et al.; licensee BioMed Central Ltd. 2013
Received: 14 September 2012
Accepted: 11 April 2013
Published: 27 April 2013
Streetscape (microscale) features of the built environment can influence people’s perceptions of their neighborhoods’ suitability for physical activity. Many microscale audit tools have been developed, but few have published systematic scoring methods. We present the development, scoring, and reliability of the Microscale Audit of Pedestrian Streetscapes (MAPS) tool and its theoretically-based subscales.
MAPS was based on prior instruments and was developed to assess details of streetscapes considered relevant for physical activity. MAPS sections (route, segments, crossings, and cul-de-sacs) were scored by two independent raters for reliability analyses. There were 290 route pairs, 516 segment pairs, 319 crossing pairs, and 53 cul-de-sac pairs in the reliability sample. Individual inter-rater item reliability analyses were computed using Kappa, intra-class correlation coefficient (ICC), and percent agreement. A conceptual framework for subscale creation was developed using theory, expert consensus, and policy relevance. Items were grouped into subscales, and subscales were analyzed for inter-rater reliability at tiered levels of aggregation.
There were 160 items included in the subscales (out of 201 items total). Of those included in the subscales, 80 items (50.0%) had good/excellent reliability, 41 items (25.6%) had moderate reliability, and 18 items (11.3%) had low reliability, with limited variability in the remaining 21 items (13.1%). Seventeen of the 20 route section subscales, valence (positive/negative) scores, and overall scores (85.0%) demonstrated good/excellent reliability and 3 demonstrated moderate reliability. Of the 16 segment subscales, valence scores, and overall scores, 12 (75.0%) demonstrated good/excellent reliability, three demonstrated moderate reliability, and one demonstrated poor reliability. Of the 8 crossing subscales, valence scores, and overall scores, 6 (75.0%) demonstrated good/excellent reliability, and 2 demonstrated moderate reliability. The cul-de-sac subscale demonstrated good/excellent reliability.
MAPS items and subscales predominantly demonstrated moderate to excellent reliability. The subscales and scoring system represent a theoretically based framework for using these complex microscale data and may be applicable to other similar instruments.
The relationship between several built environment factors and physical activity has been established in multiple reviews [1–4]. Larger characteristics, often called macro-level attributes of environments (e.g., walkability, density, street connectivity, land-use mix) are well-documented correlates of walking and physical activity [5–7]. Most of the built environment and physical activity evidence is based on macro-level variables. However, macro-level factors do not reflect the entirety of people’s experiences with their environment. “Microscale” factors include details about streets, sidewalks, intersections, and design characteristics (e.g., road crossing features, presence of trees, bicycle lanes, curbs), as well as characteristics of the social environment (e.g., stray dogs, graffiti, trash) . Microscale factors may also influence physical activity [5, 9, 10] but have not been studied as extensively as macro-level factors. The study of microscale factors allows for a more fine-grained examination of the environmental features that enable or inhibit physical activity and may be more cost effectively and easily modified than macro characteristics.
Microscale factors are typically assessed using direct observations . Many research-based microscale observation or audit tools for streetscapes (environment features in and along streets) have been developed in the past decade [5, 11], and more recently, virtual audit tools using online resources like Google Streetview have been tested (maps.google.com) [12, 13]. There is much heterogeneity of content across the microscale audit tools, although reviews have found common themes: building type, streets and traffic, sidewalks, bicycle facilities, public spaces/amenities, architecture/building characteristics, parking/driveways, maintenance, and safety [5, 10]. As summarized by Brownson et al. , several microscale audit tools have been systematically developed and have demonstrated good inter-observer agreement. The current study presents methods building on these previous approaches and provides a systematic scoring system for summarizing the data, so results can be more easily interpreted.
Interpretable and policy relevant summary scores are needed to provide evidence that could be used for decisions about making the built environment more supportive of physical activity. Several groups have developed conceptual models of microscale attributes, some of which have been used as the basis for scoring systems [14–20]. These models and scoring systems are at varying stages of development, including conceptual model only , scoring system created but inter-rater reliability not tested or not reported [14, 15, 18–20], and scoring system created but based on a statistical rather than conceptual model [17, 21]. These frameworks are all steps toward systematic scoring of microscale tools, but there is a need for a comprehensive approach that includes a conceptually-driven model and scoring procedures, along with evaluation of subscales’ reliability.
The present paper describes the development and evaluation of summary scores of a comprehensive microscale audit tool adapted from previous measures. We present a systematic approach to developing a scoring system, data reduction methods, and psychometric analysis for subscales. The overall goal was to create summary scores that can be used to assess detailed attributes of the built environment relevant to physical activity for use in research, city planning, and community advocacy that may promote activity-supportive environmental changes. To evaluate inter-rater reliability of subscales, the present paper combined data from three U.S. regions with varying levels of urbanity and walkability.
Study designs and areas
Study characteristics and designs: Neighborhood Impact on Kids (NIK) Study, Teen Environment and Neighborhood (TEAN) Study, and Senior Neighborhood Quality of Life Study (SNQLS)
Ages of participants
Design (quadrants sampled)
Total sample size: All routes
Reliability pair sample size: # routes (# segments, # crossings, # cul-de-sacs)
Years of MAPS data collection
6-11 and a parent
San Diego County, CA and Seattle/King County, WA
Activity EnvironmentaX Nutrition Environmentb
Cluster of ≥3 destinations (commercial locations, parks or schools)
San Diego County: 365
San Diego County: 76 (233, 117, 16)
San Diego County: 2009
Seattle/King County: 393
Seattle/King County: 0 (0, 0, 0)
Seattle/King County: 2009-2010
12-16 and a parent
Seattle/King County, WA and Baltimore, MD-DC
Cluster of ≥3 commercial locations, a park, or a school
Seattle/King County: 427
Seattle/King County: 72 (167, 67, 31)
Seattle/King County: 2010
Maryland-DC: 106 (42, 100, 6)
Seattle/King County, WA
Cluster of ≥3 destinations (commercial locations, parks, or school)
Seattle/King County: 462
Seattle/King County: 36 (74, 35, 0)
Seattle/King County: 2009
Microscale Audit of Pedestrian Streetscapes (MAPS) tool development
The MAPS tool was adapted from previous tools, primarily the Analytic Audit Tool , as modified by the Healthy Aging Network , and further modified by present investigators (Additional file 1: Appendix A). Specific items thought to be relevant for seniors or youth were added to the tool for all groups of participants (e.g., sidewalk cross-slope). A cul-de-sac section was added for the youth studies because of their potential use as play areas. The MAPS tool can be found online at http://sallis.ucsd.edu.
MAPS section descriptions
•Approximately ¼ mile from a participant’s home toward a predetermined destination.
•Included components of land use and destinations, streetscape, aesthetics and social environment
•Consisted of varying numbers of segments and crossings within the ¼ mile.
•A section of a street between two crossings.
•If street name changed, a new segment started.
•There were up to 8 segments per route.
•A crossing occurred when the rater went through an intersection, whether a pedestrian crossing existed or not.
•There were up to five crossings per route.
•A cul-de-sac or dead-end street had to be within 400 feet of a participant’s home.
•The cul-de-sac was usually (but not always) the dead-end part of the participant’s street.
•There were up to two cul-de-sacs per route.
The route section included items related to land use and destinations, transit stops, street amenities, traffic calming, hardscape and softscape aesthetics, and the social environment. The segments section assessed sidewalks, street buffers, sidewalk slope, bicycle facilities, shortcuts, visibility from buildings (“eyes on the street”), building aesthetics, trees, setbacks, and building height. The crossings section assessed crosswalks, slopes, width of crossings, crossing signals, and pedestrian protection (e.g., curb extensions, protected refuge islands). The cul-de-sacs section assessed the potential recreational environment within a cul-de-sac and included items about the size and condition of the surface area, slope, surveillance from surrounding homes, and amenities (e.g., basketball hoops).
Data were collected along a ¼ mile route (n = 2117 routes) starting at a study participant’s home and walking toward the nearest pre-determined destination, although not necessarily reaching a destination (Table 2). The ¼ mile designation was a major change from previous instruments and was chosen to standardize the method and limit observation time, but this approach may not be suitable for all study purposes. In each study, eligible destinations included a cluster of shops or services, a park, or a school. The shortest route from a participant’s home to the nearest eligible destination was identified using Network Analyst (ArcGIS version 9.3, ESRI, Redlands, CA, 2009). The ¼ mile endpoint was determined using Google Maps (maps.google.com). In the TEAN study in King County/WA and Baltimore-DC, data were also collected along the commercial center nearest the participant’s home. These commercial centers were identified using land use data available in GIS databases created for each study. For each route, a map was created to guide raters to the specified starting point (participant’s home or commercial center), the route to be walked, and the endpoint. The routes followed the road network (i.e., alleys and informal paths were not considered eligible routes). If the destination was reached before ¼ mile, raters continued toward the next identified destination and ended ratings there. If the destination was not reached in ¼ mile, the route ended at the conclusion of the segment that included the ¼ mile endpoint (see manual online: http://sallis.ucsd.edu). Each residential route consisted of 1 to 8 segments, 1 to 8 intersections/crossings, and up to 2 cul-de-sacs (if applicable), depending on block size and street design within the ¼ mile. Each commercial route consisted of one segment and the two intersections on either end of the segment.
Procedures for data collection
Training and data collection procedure
MAPS data were collected in 2009–10. A research staff manager was responsible for training, route planning, and quality control for each data collection site cohort. Multiple raters at each study site were trained extensively to use the MAPS tool over a 3-day training and certification period (see training manual online at http://sallis.ucsd.edu). To be certified to rate independently, raters had to complete at least four route assessments with inter-rater reliability ≥95% agreement.
Raters began MAPS auditing at a participant’s residence and walked along the designated route on same side of the street. In inclement weather, raters drove 0.2% (San Diego) to 5% (Seattle) of the routes. The items on the route section were collected across the entire ¼ mile. When the rater crossed the street, either at a designated intersection with or without a pedestrian crossing, or due to an obstruction in the walkway, a new crossing section was completed, along with a new segment section. When a street changed names, a new segment section was started. Cul-de-sacs or street dead-ends that were within 400 feet of a participant’s home were rated using the cul-de-sac section (child and teen studies only).
All routes in the reliability sample were completed by a combination of two people drawn from the pool of certified raters. The two raters independently coded 13.7% of participants’ ¼ mile routes for reliability purposes. The second ratings were completed within 1-week of the first, typically on different days. Each rater traveled the route independently and completed assessments without consulting with the other. The final reliability analysis sample included 290 route pairs, 516 segment pairs, 319 crossing pairs, and 53 cul-de-sac pairs.
The majority of the tools were completed using paper and pencil, though some were completed using a tablet PC. Residential route sections were completed in 28.5 minutes on average (range = 5-120 minutes) and commercial route sections were completed in 18.5 minutes on average (range = 2-75 minutes). Raters and coordinators reviewed each tool for missing and discrepant items. If more than 5% of items were missing, raters returned to the route and completed the missing items. Google Earth was used to fill in missing items when re-rating was not possible (approximately 1% of cases), and to verify outliers and slope measurements.
Conceptual approaches to the MAPS scoring system
Item-level scoring and data analysis
Items from each of the three main MAPS sections were sorted a priori into subscales based on group consensus (see http://sallis.ucsd.edu for item placement into subscales and results tables for sample items in each subscale). Scoring conventions were developed and when possible, were used to simplify scoring. Most items in MAPS were coded dichotomously (no/yes) and scored as 0/1. Frequency items (0, 1, 2+) were scored as 0, 1, 2 and continuous and descriptive items were dichotomized or trichotomized based on their distributions, theoretical relevance, and in compatibility with other scale items’ scoring. In several instances, related items needed to be combined into single variables to be meaningful components of their respective subscales. For example, sidewalk and walking path presence were collected as two separate items, but for interpretation they were combined to create one variable for the subscale. In such cases, the new variable was computed and then coded for scoring (e.g., di- or trichotomized) consistently with theoretically related items and to match with the other scale items’ scoring. The slope items in the segment section were scored separately for seniors compared to the other age groups. The rationale was that seniors may be more sensitive to higher slopes than younger people, so their activity may be negatively affected at lower grade slopes.
The Kappa statistic was used to assess the inter-rater reliability of dichotomous variables, accounting for the effects of chance agreement . Kappa values were judged using a modified version of Landis and Koch’s  adjectival classification system: “good to excellent” (≥0.60), “moderate” (0.41-0.60), or “fair to poor” (≤0.40) . Items with poor agreement (Kappa or ICC <0.40) were dropped from scales unless expert opinion (of those on the study team) determined their necessity/relevance. For example, some item reliabilities were poor due to low frequency. Items excluded from scale creation were kept in the MAPS tool.
Continuous or ordinal items were assessed for inter-rater reliability using one-way random effects single-measure intraclass correlation coefficients (ICC). Despite the lack of consensus in interpreting ICC values, the scale of ICCs is similar to that of Kappa, and so the classification system for Kappa values described above was also used to evaluate ICC values . Because low variability can affect Kappa and ICC values, the percent agreement was also calculated for all variables [19, 31, 33]. Percent agreement was classified using the following criteria (modified from [19, 31]: “good to excellent” (≥75%), “moderate” (60-74%), and “fair to poor” (<60%)). Items were considered acceptable if one or both measures of reliability were moderate or higher. All individual item statistics, including those from dropped items, and syntax for creating item transformations and subscales can be found at: http://sallis.ucsd.edu.
All data were analyzed using SPSS version 17.0 (SPSS, Inc., Chicago, IL). For each item (original and modified), frequency, range, mean, standard deviation, and inter-rater reliability were calculated.
Subscale creation and analysis
Scales were created and analyzed based on the reliability samples from all studies combined. After the remaining (not dropped) items were transformed as necessary, the subscale scores were computed by summing the items’ scores. The subscales were then sorted by expected positive or negative effects on physical activity, to create the valence scores (see Figures 1 and 2). For instance, the sum of the positive destinations and land uses was thought to be positively associated with physical activity, and the presence of social disorder was thought to be negatively associated. All of the positive subscales were summed to create the positive valence score, and likewise for the negative subscales into the negative valence score. Finally, an overall section score (positive–negative valence scores) was calculated for each of the three main sections. Total MAPS positive, negative, and overall scores can be calculated by summing the three main sections’ positive, negative, and overall summary scores.
For the subscales and the positive/negative/overall scores, frequency, range, mean, standard deviation, and inter-rater reliability (ICC and % agreement) were also assessed using the acceptability criteria described above. Subscale reliability scores were considered to be acceptable if ICC values were moderate or higher. In some cases, several iterations of subscales were run, with step-wise exclusion of low reliability items considered important conceptual indicators. If inclusion of low reliability but relevant items did not reduce the subscale’s reliability below acceptable standards, the items in question were included.
Average census block group-level descriptive statistics of reliability sample areas (mean, SD)
Median household income ($)
Median age (years)
Percent of residents with college degree or higher (%)
Percent residents non-Hispanic White (%)
Net Residential Densitya
Retail Floor Area Ratioc
San Diego County, CA
Seattle/King County, WA
Route subscale characteristics: all studies combined (n = 290 route reliability pairs)
# items (range of scores)
Sample items* and overall subscale description
ICC, % agreement
Range of item ICCs or Kappas
Land Use and Destinations Subscales
Residential Mix (weighted residential density)
Single family homes, apartments/condominiums, apartments above street retail
.292 (retirement/senior living facilities) - .776
Food-related land uses, retail and service-oriented land uses and shopping centers
Food-related uses (fast food, sit-down, café), entertainment
Bank/credit union, health-related professional, other services
Institutional/Services-Religious, Schools (each a single item)
1 each (0–2 each)
Government or community land use: Place of worship; school
Same as subscales
Health or social services, library/museums, post office, senior center
.279 (senior center) -.798
Parking Structures (positive influence on PA)
No parking facilities present, parallel/angled on-street parking
-.011 (no parking) -.689
Recreational Land Use-Public Recreation Facilities
Community garden, public indoor, public outdoor pay, public park
Recreational Land Use-Private Recreation Facilities
Private indoor, private outdoor
DLU Commercial (an interim subscale, may be used independently, but not included in overall scores)
3 subscales (0–21)
Sum of shops, restaurant/entertainment, and services subscales. Subscale created to reflect most common pedestrian destinations. Not included in overall positive subscale.
DLU Overall Positive Subscale
10 subscales (2–24)
Sum of subscales: residential mix, shops, restaurants/entertainment, services, government services, religious, schools, positive parking, public recreation, and private recreation.
Adverse Land Uses: Industrial, Abandoned Lot/Building, Surface Parking Lot or Garage **ALSO IS DLU Overall Negative
Warehouse/factory/industrial, abandoned building, large parking facilities
-.029 (abandoned building)-.659
DLU Overall Subscale Score
2 subscales (0–21)
DLU Overall Positive subscale – Adverse Land Uses subscale
Positive Elements Subscale
Transit stops, posted speed limit, pedestrian signage, street amenities (e.g., working telephone, trash bins)
.395 (presence of kiosks or info booths) - .838
Negative Elements Subscale
High speed limits, roll-over curbs, driveways
Overall Streetscape Score
2 subscales (−3 – 10)
Positive Streetscape Elements subscale– Negative Streetscape Elements subscale
Aesthetics and Social Subscales
Positive Aesthetics and Social Subscale
Public art, landscaping maintenance
.391 (signage for commercial destinations or parks) -.689
Negative Aesthetics and Social Subscale
Graffiti, physical disorder, broken windows
.088 (social disorder (dichot: none vs. any) -.665
Overall Aesthetics and Social Score
2 subscales (−8 – 5)
Positive Aesthetics and Social -Negative Aesthetics and Social Subscales
Total Route Score
3 overall subscales (−2 – 33)
Sum of 3 overall scores
For the destinations and land use route section, there were ten positive subscales and one negative subscale (Table 4; Figure 1). The positive destinations and land use subscales included residential mix (weighted residential density scores), shops, restaurants/entertainment, professional services, religious, schools (each of the latter two a single item), government services, positive parking structures, public recreational land use, and private recreation facilities. Nine out of ten (90%) of the positive destinations and land use subscales had good to excellent inter-rater reliability, with ICC (or Kappa for the two items) values ranging from .652 (government services) to .873 (shops). The one subscale with moderate agreement was housing mix (ICC: .577). The overall positive destinations and land use valence score was a sum of all ten positive subscales and had good/excellent reliability (ICC: .855). The negative destinations and land use valence score consisted only of the adverse land uses and demonstrated good/excellent reliability (ICC: .610; Table 4). The destinations and land use subsection score (positive valence score - negative valence score) had good/excellent reliability (ICC: .801).
Due to the low number of items and relative homogeneity of item content, the route streetscape items were aggregated into a positive or negative valence score, and the streetscape subsection score (positive–negative valence scores; Figure 1). The positive streetscape items that were thought to influence walking included items such as pedestrian signage and street amenities (e.g., drinking fountains, trash cans; Table 4). The negative items that were thought to deter walking included items such as high speed limits, multiple driveways, and non-barrier curbs. All of the streetscape valence and overall scores had good/excellent inter-rater reliability: positive (ICC: .741), negative (ICC: .742), and overall (ICC: .762).
The positive route aesthetics and social subscale had the same structure as for streetscape: positive and negative valence scores, and an aesthetics and social subsection score (positive–negative valence scores; Figure 1). The positive items thought to influence walking included availability of public art, landscaping, and general aesthetic maintenance and the positive valence score had good/excellent reliability (ICC: .632). The negative items thought to negatively influence walking included physical disorder, broken windows, and graffiti. The negative aesthetics and social valence score had moderate reliability (ICC: .514) as did the overall aesthetics and social subsection score (ICC: .580).
The overall route score was calculated from the sum of the three route subsections scores (destinations and land use, streetscape, and aesthetics and social). The route overall score had good/excellent reliability (ICC: .816). For the route valence and subsection scores, three out of three for destinations and land use had good/excellent reliability, for streetscape, three out of three had good/excellent reliability, and for aesthetics and social, one out of three had good/excellent reliability, with two having moderate reliability. In sum, of the route valence, subsection, and overall scores, seven out of nine (77.8%) had good/excellent reliability, and the remaining two (22.2%) had moderate reliability.
Segment subscale characteristics: all studies combined (n = 516 segment reliability pairs with complete sidewalk data)
# items (range)
Sample items* and overall subscale descriptions
ICC, % agreement
Range of item ICC or Kappa
Building Height and Setbacks
Smallest and largest setbacks and building height
Building Height: Road Width and Setback Ratio
Smallest and largest setbacks, building height, and road width
Sidewalk Positive Qualities
Sidewalk presence and width
Buffer presence and width
Marked bicycle lane, signage
Building Aesthetics and Design
Street-level windows, building colors and materials
.549 - .629
Number and spacing of trees, percent of sidewalk shaded
Informal Path (single item)
Is there an informal path (shortcut) which connects to something else?
.554 (K), 91.1%
7 subscales plus 1 item (3–22)
Sum of subscales: building height and setbacks, sidewalk positive qualities, buffers, bike infrastructure, building aesthetics and design, trees, plus item: cul-de-sac connectivity
Sidewalk Negative Qualities
Trip hazards, obstructions in the sidewalk
Sidewalk Steepness (children/teens)
Slope, cross-slope (steeper slope acceptable for children)
Sidewalk Steepness (seniors)
Slope, cross-slope (less steep slope acceptable for seniors)
Overall Negative Subscale (Child/Teen)
2 subscales (0–7)
Sum of subscales: Sidewalk negative qualities, sidewalk steepness (children/teens), building height: road width and setback ratio, negative street design/width
Overall Negative Subscale (Senior)
2 subscales (0–9)
Sum of subscales: Sidewalk negative qualities, sidewalk steepness (seniors), building height: road width and setback ratio, negative street design/width
Overall Segments Score (Child/Teen)
2 (−1 -19)
Overall Positive – Overall Negative subscales (child/teen)
Overall Segments Score (Senior)
2 (−2 -19)
Overall Positive – Overall Negative subscales (senior)
Crossing subscale characteristics: all studies combined (n = 319 crossing reliability pairs)
# items (range)
Sample items* and overall subscale description
ICC, % Agreement
Range of item ICC or Kappa
Crosswalk characteristics (e.g., marked crosswalk, high visibility markings)
-.012 (curb extensions) - .816
Pre- and post-crossing curb lining up with crossing
Intersection Control and Signage
Stop signs, pedestrian walk signals
.327 (traffic circle) - .811
Overall Positive Crossing Characteristics Subscale
3 subscales (0–12)
Sum of subscales: crosswalk amenities/qualities, curb quality/presence, intersection control and signage
Lanes/Road Width of Crossing
Distance of crossing leg (# lanes wide, trichotomized)
No curb ramp, gutters in crossing, faded/worn crosswalk markings
.188 (poor visibility at corners) - .893
Overall Negative Crossing Characteristics Subscale
2 subscales (0–5)
Sum of subscales: Lanes/Road Width of Crossing, Crossing Impediments
Overall Crossings Score
2 subscales (−4-8)
Sum of subscales: Overall Positive Crossing Characteristics-Overall Negative Crossing Characteristics
There were 53 cul-de-sac reliability pairs. Sample items included questions about the size of the cul-de-sac, smoothness of the pavement, and whether parking was allowed. Twelve items were included in the subscale, and fourteen were excluded. Of the included items, none had low reliability, one had moderate reliability, 8 had good/excellent reliability (highest ICC: .809), and 3 items (25.0%) were too rare to calculate Kappa or ICC. The twelve items with consensus on directionality (for impact on physical activity) were summed to create one overall positive subscale with a scores ranging from 3 to 9. The subscale had a mean of 6.5 (SD: 1.59), and it demonstrated good/excellent reliability (ICC: .847).
A systematic coding scheme was developed for the MAPS streetscape observation tool that summarized 160 items into 26 specific subscales, 10 valence scores, and one final overall score for each of the MAPS sections: route, crossings, segments, and cul-de-sacs. A majority of items (75.6%) and all but one of the subscales (96.1%) had moderate or good/excellent reliability. The most common items with lower agreement were those assessing slope, subjective qualities (e.g., the social environment/disorder), and those with complex response options (e.g., percentage of sidewalk shaded by trees, number of façade types/colors). When necessary, reconceptualization, recoding, and editing were carried out for under-performing items and scale scores. This scoring and scale development approach allows examination of microscale attributes at varying and tiered levels of specificity and complexity. The guiding assumption was that the aggregate environmental impact is likely to be most influential on physical activity. It is possible that critical positive or negative single factors can override an aggregate score, and the present scoring system allows examination of detailed items, specific conceptual categories, or aggregate scores. Thus, all coding and scoring decisions were made considering expected effects on physical activity, but the scales may be relevant for other outcomes, such as social interactions, social capital, perceptions of environments, and neighborhood satisfaction as well.
Numerous microscale tools have been developed , reflecting a recognition that the details of streetscapes are likely to affect experience and behavior beyond macro-level variables (e.g., street connectivity) that are studied much more often. However, the lack of well-developed and accessible scoring systems, along with the expense of data collection, seems to have inhibited the use of microscale tools, because there are few published reports using microscale data [9, 27, 34, 35]. The complexity of the measures also appears to have been a deterrent to developing scoring systems because of differing response scales, item formats, and conceptual domains. Many of the scoring systems that have been developed have not been tested for inter-rater reliability or reliability has not been reported in the literature [14–16, 19, 21, 36]. Thus, the present scoring system represents an advance, and the principles used may be applicable to other microscale tools.
Though all published microscale measures assess a wide range of attributes, MAPS may be more comprehensive because it assesses and provides scores for routes, crossings, segments, and cul-de-sacs. Present results are expected to be generalizable, at least to metropolitan areas in the United States, because data came from three geographic regions (Baltimore, MD/Washington, DC, San Diego, CA, and Seattle, WA) and included neighborhoods representing a wide range of urban design and income. Though MAPS has not been used in rural areas, it could be applicable to such settings as well, although further development of micro-scale evaluation in such areas is needed.
The innovative approach of evaluating routes beginning with each study participant's home was adopted for two major reasons. First, the routes were highly personalized and designed to reflect the environment each participant would likely encounter while walking or bicycling to a nearby destination. Second, the route approach was efficient and controlled the time cost of assessing each person’s local environment. Selecting routes was appropriate for the parent studies because the participants’ locations were known. Other microscale tools were designed to audit a sample of street segments, and then create summary scores that can be related to individual behavior [9, 19, 20, 33]. However, there are major questions about how many segments or what percent of neighborhood streets need to be observed to represent an adequate sample . Observed streets may or may not be representative of the environment experienced by a given participant. The larger the areas over which environmental characteristics are averaged, the less valid they are likely to be for each participant. Thus, observation of individual routes has important procedural advantages for some types of studies and is likely to produce more valid estimates of effect sizes in relation to physical activity and other outcomes.
Another advantage of the scoring system outlined in the present paper is its generalizability. The principles used to score MAPS may be applicable to other audit tools. The principles included the combination of conceptual- and policy-driven scoring systems, items scored as dichotomies or trichotomies, hypothesis-driven classification into positive or negative subscales for expected effects on physical activity, and the inclusion of a hierarchy of scores from construct-specific to aggregated. The likely utility of this approach includes the benefit that specific construct measures may be useful for guiding city planning and transportation decisions. It also may help residents understand the strengths and weaknesses of their micro-scale (street and pedestrian) environments, including assessing local disparities in environmental quality. The overall scores and aggregated scores may be better suited for explaining outcomes because cumulative effects across numerous attributes are more likely than individual items to influence behavioral outcomes and neighborhood perceptions .
The MAPS scoring and reliability process included several limitations. The decision-making processes for item retention and scale creation was driven by theory [28, 38, 39], research team discussion, statistical criteria, policy-relevance, and expert consensus. While these strategies included accepted methods for such decision-making, there were ambiguous cases and situations. The originally proposed conceptually-driven process had to be modified throughout the course of data reduction. While we present one method of combining items and subscales, subtracting negatives from positives may not reflect the complexity of various microscale elements. Ultimately, scales were created with the aforementioned blend of theory, practicality, and expert consensus in hopes that this method may prove useful to wider audiences and researchers. The present scoring system may not be optimal, and other approaches to combining items and examining patterns of attributes should be explored.
A limitation of the design is that the selected route may not be a direction usually taken by the participant, or a participant may take multiple routes to the destination(s) of interest. However, the route selection procedure was designed to identify a likely route in the direction of the nearest destination. The most important environmental attributes (e.g., a high speed roadway) may have been just beyond the ¼ -mile designated route. This issue can be better addressed in forthcoming validity analyses.
Quantifying the diversity of built environment variables is inherently complicated, and rarely does one answer fit all situations. Given that the focus of the present scoring process was on influences relevant to physical activity and the desire to link variables with planning and transportation policies, some of the decisions may have created scales that do not apply to all potential future uses. For example, present scoring may not be generalizable if scoring neighborhoods for relevance to social capital or in very different contexts such as Asian or African cities. Further, the three regions sampled herein likely do not represent the entire range of environments that MAPS may be used in the future. During analysis, many of the dropped items were determined to contain ambiguous wording or concepts. Future investigators can revise or exclude items in a modified MAPS but should carefully describe changes and additions, including validation of any changes.
Typical methods of scale development include factor analysis and assessment of internal consistency (i.e., Cronbach’s alpha), but they were not used in the present study. These approaches are appropriate for psychological measures but may not be for environmental assessment. Features that are conceptually related (e.g. street lights and speed limit for safety) may not co-exist, so internal consistency is not a sufficient index of scale quality. The present goal was to create conceptually coherent and policy-relevant scales. By identifying clusters of environmental attributes under the jurisdictions of planning, transportation, and public works departments, results can be more relevant for establishing accountability for making environmental improvements that would be expected to promote physical activity.
The next analytic step is to conduct validity analyses that will relate these scale scores to total physical activity and specific walking outcomes (e.g., walking to school, for leisure, for transport) in multiple age groups. Another next step would be to develop an electronic-based interface for integrated data collection and scoring (e.g., iPad app). Following validation, these scales can be used for research and practice purposes. As tools like MAPS are used in different environments and countries, and by investigators interested in different outcomes, the tool itself is likely to be modified. It is difficult to imagine how a standard version could be developed that could be used for community intervention and policy advocacy purposes and be comparable across locations. As one step toward more frequent use of streetscape audits, validity analyses could result in a shorter form containing the most valid items and scales.
This paper presets a theoretically based policy relevant scoring system for the MAPS environmental audit tool. The modest number of interpretable scores makes the tool more useable by researchers, policy makers, and practitioners. The MAPS tool and scoring system demonstrated strong reliability and can be recommended for wider use.
This study was supported by National Institutes of Health grants RO1 ES014240, RO1 HL083454, ROI HL67350 and RO1 HL077141. DVD was supported by Research Foundation Flanders (FWO), B/09731/01. The authors wish to acknowledge Brooks Lecomte for his assistance with formatting references and checking item-level details.
- Ding D, Sallis JF, Kerr J, Lee S, Rosenberg DE: Neighborhood environment and physical activity among youth: a review. Am J Prev Med. 2011, 41: 442-455. 10.1016/j.amepre.2011.06.036.View ArticlePubMedGoogle Scholar
- Gebel K, Bauman AE, Petticrew M: The physical environment and physical activity: a critical appraisal of review articles. Am J Prev Med. 2007, 32 (5): 361-369. 10.1016/j.amepre.2007.01.020.View ArticlePubMedGoogle Scholar
- Heath GW, Brownson RC, Kruger J, Miles R, Powell KE, Ramsey LT: The effectiveness of urban design and land use and transport policies and practices to increase physical activity: a systematic review. J Phys Act Health. 2006, 3 (Suppl 1): S55-S76.Google Scholar
- Van Cauwenberg J, De Bourdeaudhuij I, De Meester F, Van Dyck D, Salmon J, Clarys P, Deforche B: Relationship between the physical environment and physical activity in older adults: a systematic review. Health Place. 2011, 17: 458-469. 10.1016/j.healthplace.2010.11.010.View ArticlePubMedGoogle Scholar
- Brownson RC, Hoehner CM, Day K, Forsyth A, Sallis JF: Measuring the built environment for physical activity: state of the science. Am J Prev Med. 2009, 36 (Suppl 4): S99-123. 10.1016/j.amepre.2009.01.005.View ArticlePubMedPubMed CentralGoogle Scholar
- Saelens BE, Handy SL: Built environment correlates of walking: a review. Med Sci Sports Exerc. 2008, 40 (Supp 7): S550-S556. 10.1249/MSS.0b013e31817c67a4.View ArticlePubMedPubMed CentralGoogle Scholar
- Brennan Ramirez LK, Hoehner CM, Brownson RC, Cook R, Orleans CT, Hollander M, Barker DC, Bors P, Ewing R, Killingsworth R, Petersmarck K, Schmid T, Wilkinson W: Indicators of activity-friendly communities: an evidence-based consensus process. Am J Prev Med. 2006, 31: 515-524. 10.1016/j.amepre.2006.07.026.View ArticlePubMedGoogle Scholar
- Sallis JF, Slymen DJ, Conway TL, Frank LD, Saelens BE, Cain K, Chapman JE: Income disparities in perceived neighborhood built and social environment attributes. Health Place. 2011, 17: 1274-1283. http://dx.doi.org/10.1016/j.healthplace.2011.02.006.View ArticlePubMedGoogle Scholar
- Boarnet MG, Forsyth A, Day K, Oakes JM: The street level built environment and physical activity and walking: results of a predictive validity study for the Irvine-Minnesota Inventory. Environ Behav. 2011, 43: 735-775. 10.1177/0013916510379760.View ArticleGoogle Scholar
- Moudon AV, Lee C: Walking and bicycling: an evaluation of environmental audit instruments. Am J Health Prom. 2003, 18 (1): 21-37. 10.4278/0890-1171-18.1.21. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/13677960 View ArticleGoogle Scholar
- Day K: Audit tools comparison database. [http://activelivingresearch.org/files/AuditToolsComparisonTable.pdf]
- Badland HM, Opit S, Witten K, Kearns RA, Mavoa S: Can virtual streetscape audits reliably replace physical streetscape audits?. J Urban Health. 2010, 87: 1007-16. 10.1007/s11524-010-9505-x.View ArticlePubMedPubMed CentralGoogle Scholar
- Clarke P, Ailshire J, Melendez R, Bader M, Morenoff J: Using Google Earth to conduct a neighborhood audit: reliability of a virtual audit instrument. Health Place. 2010, 16: 1224-1229. 10.1016/j.healthplace.2010.08.007. Retrieved from http://dx.doi.org/10.1016/j.healthplace.2010.08.007 View ArticlePubMedPubMed CentralGoogle Scholar
- Alfonzo M, Boarnet MG, Day K, McMillan T, Anderson CL: The relationship of neighbourhood built environment features and adult parents’ walking. J Urban Des. 2008, 13: 29-51. 10.1080/13574800701803456.View ArticleGoogle Scholar
- Brownson RC, Hoehner CM, Brennan LK, Cook RA, Elliott MB, McMullen KM: Reliability of two instruments for auditing the environment for physical activity. J Phys Act Health. 2004, 1: 189-207.Google Scholar
- Caughy MO, O’Campo PJ, Patterson J: A brief observational measure for urban neighborhoods. Health Place. 2001, 7: 225-36. 10.1016/S1353-8292(01)00012-0. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/11439257 View ArticlePubMedGoogle Scholar
- Evenson KR, Sotres-Alvarez D, Herring AH, Messer L, Laraia BA, Rodríguez DA: Assessing urban and rural neighborhood characteristics using audit and GIS data: derivation and reliability of constructs. Int J Behav Nutr Phys Act. 2009, 6: 44-10.1186/1479-5868-6-44.View ArticlePubMedPubMed CentralGoogle Scholar
- McCormack GR, Mâsse LC, Bulsara M, Pikora TJ, Giles-Corti B: Constructing indices representing supportiveness of the physical environment for walking using the Rasch measurement model. Int J Behav Nutr Phys Act. 2006, 13: 1-13. 10.1186/1479-5868-3-44.Google Scholar
- Pikora TJ, Bull FCL, Jamrozik K, Knuiman M, Giles-Corti B, Donovan RJ: Developing a reliable audit instrument to measure the physical environment for physical activity. Am J Prev Med. 2002, 23 (3): 187-94. 10.1016/S0749-3797(02)00498-1. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12350451 View ArticlePubMedGoogle Scholar
- Pikora TJ, Giles-Corti B, Bull F, Jamrozik K, Donovan R: Developing a framework for assessment of the environmental determinants of walking and cycling. Soc Sci Med. 2003, 56 (8): 1693-703. 10.1016/S0277-9536(02)00163-6. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12639586 View ArticlePubMedGoogle Scholar
- Jago R, Baranowski T, Zakeri I, Harris M: Observed environmental features and the physical activity of adolescent males. Am J Prev Med. 2005, 29: 98-104. 10.1016/j.amepre.2005.04.002.View ArticlePubMedGoogle Scholar
- Frank LD, Saelens BE, Chapman J, Sallis JF, Kerr J, Glanz K, Couch SC, Learnihan V, Zhou C, Colburn T, Cain KL: Objective assessment of obesogenic environments in youth: geographic information system methods and spatial findings from the Neighborhood Impact on Kids (NIK) Study. Am J Prev Med. 2012, 42 (5): e47-55. 10.1016/j.amepre.2012.02.006.View ArticlePubMedGoogle Scholar
- King AC, Sallis JF, Frank LD, Saelens BE, Cain K, Conway TL, Chapman JE, Ahn DK, Kerr J: Aging in neighborhoods differing in walkability and income: associations with physical activity and obesity in older adults. Soc Sci Med. 2011, 73: 1525-1533. 10.1016/j.socscimed.2011.08.032.View ArticlePubMedPubMed CentralGoogle Scholar
- Sallis JF, Conway TL, Kerr J, Saelens BE, Frank LD, Glanz K, Slymen DJ, Cain KL, Chapman JC: Adolescents’ Physical Activity as Related to Built Environments: TEAN Study in the US. Presentation at International Society Behavioral Nutrition and Physical Activity conference:. 2011, Melbourne, Australia, JuneGoogle Scholar
- Frank LD, Sallis JF, Saelens BE, Leary L, Cain K, Conway TL, Hess PM: The development of a walkability index: application to the Neighborhood Quality of Life Study. Br J Sports Med. 2010, 44: 924-933. 10.1136/bjsm.2009.058701.View ArticlePubMedGoogle Scholar
- Kealey M, Kruger J, Hunter R, Ivey S, Satariano W, Bayles C, Ramirez B, Bryant L, Johnson C, Lee C, Levinger D, Mctigue K, Moni C, Moudon AV, Pluto D, Prohaska T, Sible C, Tindal S, Wilcox S, Winters K, Williams K: Engaging Older Adults to Be More Active Where They Live: Audit Tool Development. Proceedings of the 19th National Conference on Chronic Disease Prevention and Control. 2005, Atlanta, MarchGoogle Scholar
- Hoehner CM, Brennan Ramirez LK, Elliott MB, Handy SL, Brownson RC: Perceived and objective environmental measures and physical activity among urban adults. Am J Prev Med. 2005, 28 (2 Suppl 2): 105-16. 10.1016/j.amepre.2004.10.023.View ArticlePubMedGoogle Scholar
- Jacobs AB: Great streets. 1995, Cambridge: MIT Press, Retrieved from http://books.google.com/books?id=e5a-QgAACAAJ&pgis=1 Google Scholar
- Cohen J: A coefficient of agreement for nominal scales. Ed Psych Measurement. 1960, 20: 37-46. 10.1177/001316446002000104.View ArticleGoogle Scholar
- Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics. 1977, 33: 159-174. 10.2307/2529310.View ArticlePubMedGoogle Scholar
- Saelens BE, Frank LD, Auffrey C, Whitaker RC, Burdette HL, Colabianchi N: Measuring physical environments of parks and playgrounds: EAPRS instrument development and inter-rater reliability. J Phys Act Health. 2006, 3 (Suppl 1): 190-207.Google Scholar
- Shrout PE: Measurement reliability and agreement in psychiatry. Stat Methods Med Res. 1998, 7: 301-317. 10.1177/096228029800700306.View ArticlePubMedGoogle Scholar
- Boarnet MG, Day K, Alfonzo M, Forsyth A, Oakes M: The Irvine-Minnesota inventory to measure built environments: reliability tests. Am J Prev Med. 2006, 30: 153-159. 10.1016/j.amepre.2005.09.018.View ArticlePubMedGoogle Scholar
- Brennan Ramirez LK, Hoehner CM, Brownson RC, Cook R, Orleans CT, Hollander M, Barker DC, Bors P, Ewing R, Killingsworth R, Petersmarck K, Schmid T, Wilkinson W: Indicators of activity-friendly communities: an evidence-based consensus process. Am J Prev Med. 2006, 31: 515-24. 10.1016/j.amepre.2006.07.026.View ArticlePubMedGoogle Scholar
- Clifton K, Livi Smith A, Rodriguez D: The development and testing of an audit for the pedestrian environment. Landscape Urban Plan. 2007, 80: 95-110. 10.1016/j.landurbplan.2006.06.008.View ArticleGoogle Scholar
- Pikora TJ, Giles-Corti B, Knuiman MW, Bull FC, Jamrozik K, Donovan RJ: Neighborhood environmental factors correlated with walking near home: Using SPACES. Med Sci Sports Exerc. 2006, 38: 708-714. 10.1249/01.mss.0000210189.64458.f3.View ArticlePubMedGoogle Scholar
- Sallis JF, Bowles HR, Bauman A, Ainsworth BE, Bull FC, Craig CL, Sjöström M, De Bourdeaudhuij I, Lefevre J, Matsudo V, Matsudo S, Macfarlane DJ, Gomez LF, Inoue S, Murase N, Volbekiene V, McLean G, Carr H, Heggebo LK, Tomten H, Bergman P: Neighborhood environments and physical activity among adults in 11 countries. Am J Prev Med. 2009, 36: 484-490. 10.1016/j.amepre.2009.01.031.View ArticlePubMedGoogle Scholar
- Duany A, Plater-Zyberk E, Speck J: Suburban nation: the rise of sprawl and the decline of the American Dream. 1970, North Point Press, Retrieved from http://books.google.com/books?hl=en&lr=&id=UZ0-0X4aiwQC&pgis=1 Google Scholar
- Frank LD, Engelke PO, Schmid TL: Health and community design: the impact of the built environment on physical activity. 2003, Washington: Island Press, Retrieved from http://books.google.com/books?hl=en&lr=&id=1hG7nEznaqoC&pgis=1 Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2458/13/403/prepub