6  Mapping Collective Efficacy

Show code
# Load Packages
library(tidyverse)
library(easystats)
library(gt)
library(sf)
library(haven)
library(readxl)
library(here)
library(janitor)
library(viridis)
library(plotly)
library(mapview)
library(leaflet)
library(htmlwidgets)
library(biscale)
library(base64enc)
library(htmltools)

options(scipen = 999)

6.0.1 Introduction

In the previous chapter, we analyzed bivariate relationships between item- and scale-level measures of collective efficacy and perceived and experienced criminal behaviors in residents’ neighborhoods. We found that individuals who perceive their neighborhood to be high in collective efficacy (at least for certain items meant to measure the key sub-dimensions of collective efficacy–social cohesion and informal social control) generally perceived less crime in their neighborhood and, to a lesser extent, report fewer direct experiences with criminal victimization. Of course, the theory of collective efficacy was not developed to explain individual-level differences in neighborhood perceptions and crime. Rather, it was conceived as a collective neighborhood-level corollary to the classic psychological concept of self-efficacy. Instead of emphasizing an individual’s belief in their ability to succeed in specific situations or accomplish a specific task, collective efficacy is meant to capture a community or neighborhood’s collective “mutual trust or willingness to intervene for the common good…(Sampson et al., 1997: p. 919)” such as controlling crime and antisocial behavior. With this in mind, we turn to visualizing neighborhood-level variations in collective efficacy and crime below.

6.0.2 Load the Data

First, we need to load the crime rate data with geographic information for mapping and the survey data with a neighborhood identifier that corresponds to the neighborhoods in the crime rate data (see Appendix II section 11.0.1). We will also load the pre-processed analytic data from the prior chapter that includes our collective efficacy and crime-related items, sub-scales, and scales.

Show code
#Geographic data:
kcmo_crimerate_sf <- readRDS(here("Data", "kcmo_crimerate_sf.rds"))
kc_combsurv_geoid <- readRDS(here("Data", "kc_combsurv_geoid.rds"))
#Survey data:
kc_combsurv_ceanal <- readRDS(here("Data", "kc_combsurv_ceanal.rds"))
kc_combsurv_recode <- readRDS(here("Data", "kc_combsurv_recode.rds"))

6.0.3 Map Geography & Crime Rate Data

We can start by mapping the geographic characteristics of the data, including the neighborhoods that were sampled. The sf package is designed to work with a host of different plotting packages. While the sf package has some built-in plotting features and we typically prefer the generality of ggplot2 for making plots in R, the “leaflet” package seems to provide access to one of the most all-purpose mapping tools. So we’ll use that below to allow for some degree of interactivity.

First, we can map whether the neighborhood was sampled and whether it was part of the random or high crime sample.

Show code
sample_pal <- colorFactor(palette = c("white", "#FDE725FF", "#440154FF"),
                   domain = kcmo_crimerate_sf$sample_fact)

sample_map <- leaflet(kcmo_crimerate_sf) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add Positron map tiles
  setView(lng = -94.5786, lat = 39.0997, zoom = 10) %>%  # KC coordinates  
  addPolygons(
    fillColor = ~sample_pal(sample_fact),
    weight = 1,
    opacity = 1,
    color = "black",
    dashArray = "",
    fillOpacity = 0.5,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.5,
      bringToFront = TRUE),
    popup = ~paste("Neighborhood:", NBHNAME,
                   "<br>Population (2020):", SUM_POP20,
                   "<br>Sample:", sample_fact,
                   "<br>Violent Crime (per 1,000):", VC_1000_Rate,
                   "<br>Property Crime (per 1,000):", PC_1000_Rate,
                   "<br>Total Crime (per 1,000):", Crime_1000_Rate),
    label = ~NBHNAME
  ) %>%
  addLegend(pal = sample_pal, 
            values = ~sample_fact, 
            opacity = 0.7, 
            title = "Sample",
            position = "bottomright")
sample_map
Show code
saveWidget(sample_map, file = here("maps", "sample_map.html"))

In the above plot, hovering over the neighborhoods will reveal their names and clicking on the neighborhoods will reveal additional information, including the population from the 2020 census, whether it was sampled and which sample if so (random; high violent crime), and the crime rate (violent, property, and total) in 2024. You will notice that neighborhoods from the “High Crime Sample” are clustered around the geographic center of the city and just south of the central business district (Missouri River to the north and 31st street to the south).

Although redundant with information Dr. Kotlaja already has, we can also map the crime rate data as well. We present this in separate tabs below. We have capped the maximum values to be a round number that is close to 2 standard deviations above the mean crime rate for all crime, violent crime, and property crime respectively.

Show code
totcrime_pal <- colorNumeric(palette = "viridis",
                             domain = c(0, 300),
                             reverse = TRUE)

totcrime_map <- leaflet(kcmo_crimerate_sf) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add Positron map tiles
  setView(lng = -94.5786, lat = 39.0997, zoom = 10) %>%  # KC coordinates  
  addPolygons(
    fillColor = ~totcrime_pal(pmin(Crime_1000_Rate, 300)),
    weight = 1,
    opacity = 1,
    color = "black",
    dashArray = "",
    fillOpacity = 0.5,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.5,
      bringToFront = TRUE),
    popup = ~paste("Neighborhood:", NBHNAME,
                   "<br>Population (2020):", SUM_POP20,
                   "<br>Sample:", sample_fact,
                   "<br>Violent Crime (per 1,000):", VC_1000_Rate,
                   "<br>Property Crime (per 1,000):", PC_1000_Rate,
                   "<br>Total Crime (per 1,000):", Crime_1000_Rate),
    label = ~NBHNAME
  ) %>%
  # fitBounds(lng1 = bbox[1], lat1 = bbox[2], 
  #           lng2 = bbox[3], lat2 = bbox[4]) %>%  
  addLegend(pal = totcrime_pal, 
            values = c(0, 300), 
            opacity = 0.7, 
            title = "Total Crime Rate<br>(per 1,000)",
            position = "bottomright")
totcrime_map 
Show code
saveWidget(totcrime_map, file = here("maps", "totcrime_map.html"))
Show code
violcrime_pal <- colorNumeric(palette = "viridis",
                              domain = c(0, 40),
                              reverse = TRUE)

violcrime_map <- 
leaflet(kcmo_crimerate_sf) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add Positron map tiles
  setView(lng = -94.5786, lat = 39.0997, zoom = 10) %>%  # KC coordinates  
  addPolygons(
    fillColor = ~violcrime_pal(pmin(VC_1000_Rate, 40)),
    weight = 1,
    opacity = 1,
    color = "black",
    dashArray = "",
    fillOpacity = 0.5,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.5,
      bringToFront = TRUE),
    popup = ~paste("Neighborhood:", NBHNAME,
                   "<br>Population (2020):", SUM_POP20,
                   "<br>Sample:", sample_fact,
                   "<br>Violent Crime (per 1,000):", VC_1000_Rate,
                   "<br>Property Crime (per 1,000):", PC_1000_Rate,
                   "<br>Total Crime (per 1,000):", Crime_1000_Rate),
    label = ~NBHNAME
  ) %>%
  addLegend(pal = violcrime_pal, 
            values = c(0, 40), 
            opacity = 0.7, 
            title = "Violent Crime Rate<br>(per 1,000)",
            position = "bottomright")

violcrime_map
Show code
saveWidget(violcrime_map, file = here("maps", "violcrime_map.html"))
Show code
propcrime_pal <- colorNumeric(palette = "viridis",
                              domain = c(0, 300),
                              reverse = TRUE)

propcrime_map <- 
leaflet(kcmo_crimerate_sf) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add Positron map tiles
  setView(lng = -94.5786, lat = 39.0997, zoom = 10) %>%  # KC coordinates  
  addPolygons(
    fillColor = ~propcrime_pal(pmin(PC_1000_Rate, 300)),
    weight = 1,
    opacity = 1,
    color = "black",
    dashArray = "",
    fillOpacity = 0.5,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.5,
      bringToFront = TRUE),
    popup = ~paste("Neighborhood:", NBHNAME,
                   "<br>Population (2020):", SUM_POP20,
                   "<br>Sample:", sample_fact,
                   "<br>Violent Crime (per 1,000):", VC_1000_Rate,
                   "<br>Property Crime (per 1,000):", PC_1000_Rate,
                   "<br>Total Crime (per 1,000):", Crime_1000_Rate),
    label = ~NBHNAME
  ) %>%
  addLegend(pal = propcrime_pal, 
            values = c(0, 300), 
            opacity = 0.7, 
            title = "Property Crime Rate<br>(per 1,000)",
            position = "bottomright")

propcrime_map
Show code
saveWidget(propcrime_map, file = here("maps", "propcrime_map.html"))

The charts above largely reproduce the story evident in the sampling map. Neighborhoods with higher crime are generally concentrated just south of the central business district. Of course, since the “high crime” sample was selected based on violent crime rates, the “high crime” sample neighborhoods correspondingly includes all but a few of the neighborhoods with the highest violent crime rates (notable exceptions not sampled include Hospital Hill, Northeast Industrial District, and Blue Valley Industrial neighborhoods).

6.0.4 Aggregating Survey Data

The above maps are simply plotting administrative data. More relevant to our purposes is to map the aggregated survey data on collective efficacy and perceived crime and experienced victimization to the neighborhood level. To do this, we first need to aggregate the individual-level data to the neighborhood level and map them similar to how we did above. But first, before reinforcing the formal neighborhood boundaries as defined by the Kansas City government, it is worth examining what survey respondents identified as their neighborhood.

6.0.4.1 Residents’ Perceived Neighborhoods

As Dr. Kotlaja anticipated, residents do not necessarily identify their neighborhood as the official neighborhood boundaries. A simple way to assess this is by identifying all unique combinations of the NBHD and Q1_1_Text (self-reported neighborhood name) variables in the survey data and then comparing alongside the official neighborhood names from the crime rate data.

Show code
kc_combsurv_recode %>%
  left_join(kc_combsurv_geoid) %>%
  select(ID, NBHD, Q1_1_TEXT, NBHID, NBHNAME) %>%
  dplyr::distinct(NBHD, Q1_1_TEXT, .keep_all = TRUE) %>%
  arrange(NBHD) %>%
  zap_label() %>%
  gt() %>%
    tab_options(
      container.height = px(500),
      container.overflow.y = TRUE)
ID NBHD Q1_1_TEXT NBHID NBHNAME
66 1 Wild berry 217 Prairie Point-Wildberry
70 1 Wildberry 217 Prairie Point-Wildberry
144 1 Prairie point-Wildberry 217 Prairie Point-Wildberry
274 1 217 Prairie Point-Wildberry
279 1 Zona Rosa 217 Prairie Point-Wildberry
280 1 Tiffany Springs 217 Prairie Point-Wildberry
13 2 Townhome 235 New Mark
47 2 Stratford Place Townhomes 235 New Mark
80 2 235 New Mark
82 2 Township 235 New Mark
148 2 New mark 235 New Mark
149 2 Newmark Brookings 235 New Mark
183 2 Oak Tree 235 New Mark
263 2 New marks brooking 235 New Mark
282 2 Normandy Courts 235 New Mark
73 3 Barrington woods 240 Shoal Creek
74 3 Harrington woods 240 Shoal Creek
147 3 Benson pl brookview 240 Shoal Creek
283 3 240 Shoal Creek
284 3 Cooley Highlands 240 Shoal Creek
285 3 Benson place 240 Shoal Creek
384 3 Benson Place 240 Shoal Creek
286 5 Holly Farms 232 Outer Gashland-Nashua
288 5 Nashua 232 Outer Gashland-Nashua
373 5 Cadence 232 Outer Gashland-Nashua
374 5 Holly Farm 232 Outer Gashland-Nashua
21 6 Barry harbour 219 Barry Harbour
155 6 Barry Barbour hunters ridge 219 Barry Harbour
289 6 Cedar Ridge 219 Barry Harbour
290 6 Barry Harbor 219 Barry Harbour
291 6 219 Barry Harbour
292 6 Barry Harbor/Hunters Ridge 219 Barry Harbour
65 7 Embassy Park 221 The Coves
127 7 North lake meadows 221 The Coves
170 7 The Coves 221 The Coves
173 7 North lakes 221 The Coves
257 7 Northlake 221 The Coves
293 7 North Lakes is the subdivision but I think maybe Tiffany Springs is the area? 221 The Coves
294 7 The Coves 221 The Coves
168 8 Woodfield 205 Ravenwood-Somerset
171 8 Ravenwood Somerset 205 Ravenwood-Somerset
256 8 Somerset 205 Ravenwood-Somerset
295 8 Ravenwood-Somerset 205 Ravenwood-Somerset
296 8 Ravenwood-Sommerset 205 Ravenwood-Somerset
298 8 Carriage Hill Estates 205 Ravenwood-Somerset
299 9 Maple Park 210 Maple Park
301 10 winnwood gardens 207 Winnwood Gardens
302 10 Winnwood Gardens 207 Winnwood Gardens
37 11 Lykins 17 Lykins
51 11 Northeast 17 Lykins
182 11 17 Lykins
211 11 Old Northeast 17 Lykins
237 11 Pendleton heights 17 Lykins
242 11 Lykins or ne 17 Lykins
20 12 Longfellow 12 Longfellow
22 12 12 Longfellow
23 12 Longfellow heights 12 Longfellow
222 12 Lomgfellow 12 Longfellow
34 13 62 Palestine East
36 13 Palestine 62 Palestine East
225 13 East Palestine 62 Palestine East
229 13 The 30s 62 Palestine East
308 13 Palestine East 62 Palestine East
310 13 Palestin area 62 Palestine East
1 14 Blue Valley 31 South Blue Valley
2 14 31 South Blue Valley
3 14 Vanbrunt 31 South Blue Valley
6 14 Blue valley 31 South Blue Valley
95 14 Van brunt 31 South Blue Valley
152 14 East side 31 South Blue Valley
316 14 South Blue Valley 31 South Blue Valley
44 15 60 Oak Park Southeast
250 15 Oak Park 60 Oak Park Southeast
318 15 Oak park 60 Oak Park Southeast
4 16 76 North Hyde Park
35 16 Hyde park 76 North Hyde Park
107 16 North Hyde park 76 North Hyde Park
145 16 Hyde Park 76 North Hyde Park
177 16 North Hyde Park 76 North Hyde Park
320 16 North hyde park 76 North Hyde Park
321 16 North Hydepark 76 North Hyde Park
17 17 Art institute/ south Moreland 80 Southmoreland
46 17 Southmoreland 80 Southmoreland
50 17 Souhmoreland 80 Southmoreland
123 17 South Moreland / Westport 80 Southmoreland
164 17 Westport 80 Southmoreland
45 18 River Market 2 River Market
106 18 River market 2 River Market
322 18 2 River Market
324 18 Market Station 2 River Market
52 19 Creek wood commons 192 Davidson
169 19 South Oakwood 192 Davidson
238 19 Williamsburg 192 Davidson
326 19 Cooley Highlands 192 Davidson
327 19 192 Davidson
330 19 Creekwood 192 Davidson
10 21 Ruskin Heights 171 Ruskin Heights
60 21 171 Ruskin Heights
64 21 Ruskin heights 171 Ruskin Heights
85 21 Ruskin 171 Ruskin Heights
210 21 Ruskin heights/ Hickman mills area 171 Ruskin Heights
259 21 Terrace 171 Ruskin Heights
260 21 Off manchester 171 Ruskin Heights
273 22 128 East Meyer 6
276 22 Roseville 128 East Meyer 6
269 23 131 Brown Estates
270 23 Brown estates 131 Brown Estates
154 24 178 Little Blue
9 25 Hickman Mills 169 Hickman Mills South
83 25 Hickman mills south 169 Hickman Mills South
84 25 169 Hickman Mills South
332 25 holiday hills 169 Hickman Mills South
7 26 Waldo 106 Tower Homes
62 26 Tower park 106 Tower Homes
129 26 Rock hill garden 106 Tower Homes
138 26 106 Tower Homes
150 26 Tower Homes 106 Tower Homes
186 26 Tower 106 Tower Homes
202 26 Tower homes 106 Tower Homes
209 26 Tower Park 106 Tower Homes
253 26 Rockhill Gardens 106 Tower Homes
15 27 Linden Hills 142 Linden Hills And Indian Heights
29 27 Linden hills 142 Linden Hills And Indian Heights
76 27 Linden Hill 142 Linden Hills And Indian Heights
146 27 Linden Hiils 142 Linden Hills And Indian Heights
181 27 Linden Hills and Indian heights 142 Linden Hills And Indian Heights
67 28 Waldo Homes 110 Waldo Homes
68 28 Waldo 110 Waldo Homes
142 28 110 Waldo Homes
335 28 Rock hill Gardens 110 Waldo Homes
14 29 Marlboro Height 95 Morningside
53 29 Morningside 95 Morningside
54 29 Wornell Homestead 95 Morningside
56 29 Morninside 95 Morningside
338 29 Brookside 95 Morningside
342 29 Morningside Neighborhood Is our neighborhood association 95 Morningside
43 30 186 Richards Gebaur
90 30 Grandview 186 Richards Gebaur
11 31 Paseo West 5 Paseo West
12 31 5 Paseo West
89 31 West Paseo 5 Paseo West
272 31 Rosehill Townhomes 5 Paseo West
345 31 Paseowest 5 Paseo West
24 33 key coalition 56 Key Coalition
25 33 Key Coalition 56 Key Coalition
26 33 Key Coaltion 56 Key Coalition
27 33 Not that I know of 56 Key Coalition
28 33 Key coalition 56 Key Coalition
109 33 Ivanhoe 56 Key Coalition
161 33 56 Key Coalition
215 33 Spring Hill 56 Key Coalition
16 34 Ivanhoe ne 55 Ivanhoe Northeast
39 34 55 Ivanhoe Northeast
40 34 Ivanhoe Gardens 55 Ivanhoe Northeast
41 34 Ivanhoe 55 Ivanhoe Northeast
233 34 Ivahoe 55 Ivanhoe Northeast
38 35 Dunbar gardens 66 Dunbar
108 35 Dunbar 66 Dunbar
151 35 66 Dunbar
348 35 Leeds 66 Dunbar
48 36 81 Old Westport
57 36 Valentine 81 Old Westport
112 36 Westport 81 Old Westport
114 36 Westport - nbhd org. name is Heart of Westport 81 Old Westport
258 36 Westport area 81 Old Westport
351 36 WESTPORT ENTERTAINMENT DISTRICT 81 Old Westport
352 37 Mount hope 51 Mount Hope
353 37 Boston Heights/Mount Hope 51 Mount Hope
354 37 I think it’s Mt. Hope 51 Mount Hope
5 38 Ivanhoe 54 Ivanhoe Southeast
199 38 Ivanhoe Neighborhood 54 Ivanhoe Southeast
235 38 Ivanhoe Se 54 Ivanhoe Southeast
252 38 54 Ivanhoe Southeast
130 39 134 East Swope Highlands
131 39 Manchester 134 East Swope Highlands
355 39 East Swope Highlands 134 East Swope Highlands
356 39 East Meyer 134 East Swope Highlands
277 40 122 Blenheim Square Research Hospital
8 41 Union Hill 11 Union Hill
30 41 Union hill 11 Union Hill
110 41 Ivanhoe, olive st. 11 Union Hill
227 41 The 30s 11 Union Hill
120 42 59 Oak Park Southwest
165 42 Oak park 59 Oak Park Southwest
366 42 Oak park southwest 59 Oak Park Southwest
368 42 Ivanhoe 59 Oak Park Southwest
370 43 Rosehill NA NA
371 43 NA NA
372 43 Bristol Park NA NA
377 43 Pembrooke Estates NA NA
379 43 Brandon mosley NA NA
380 43 Rosehill townhomes NA NA
381 43 Oak crest NA NA
382 43 Rolling Meadows NA NA
383 43 Tiffany Springs NA NA


Scrolling through the table above offers a basic sense of the differences in how residents identify their neighborhoods. Despite substantial overlap, there also appears to be much greater variation in residents’ reports of their neighborhood compared to official boundaries as anticipated (cf. here, here, here, and here). Nonetheless, for this initial report, we will rely on the official neighborhood boundaries since they permit direct comparisons with the aggregated crime data we received.

6.0.4.2 Aggregate key measures to the Neighborhood-level

The next step is to aggregate the survey data and merge it with the crime rate data. All we need for this are the NBHD variable that indicates the neighborhoods sampled and the specific measures we want to aggregate (e.g., collective efficacy, perceived violence, and experienced victimization). In addition to aggregating the data by calculating the mean of each of our key variables, we will also calculate standard errors and confidence intervals for the mean that we will use later to build our intuition about the uncertainty of these estimates of neighborhood-level values.

Show code
kc_combsurv_agganal  <- kc_combsurv_ceanal %>%
  dplyr::group_by(NBHD) %>%
  dplyr::summarize(n = n(),
                   #Social Cohesion:
                   mean_cohesion = mean(soc_cohesion, na.rm = TRUE),
                   sd_cohesion = sd(soc_cohesion, na.rm = TRUE),
                   n_cohesion = sum(!is.na(soc_cohesion)),
                   sem_cohesion = sd_cohesion / sqrt(n_cohesion),
                   t_cohesion = ifelse(n_cohesion > 1, qt(0.975, df = n_cohesion - 1), NA_real_),
                   up95ci_cohesion_tdist = mean_cohesion + (t_cohesion * sem_cohesion),
                   low95ci_cohesion_tdist = mean_cohesion - (t_cohesion * sem_cohesion),
                   mean_cohesionz = mean(soc_cohesion_z, na.rm = TRUE),
                   sd_cohesionz = sd(soc_cohesion_z, na.rm = TRUE),
                   sem_cohesionz = sd_cohesionz / sqrt(n_cohesion),
                   t_cohesionz = ifelse(n_cohesion > 1, qt(0.975, df = n_cohesion - 1), NA_real_),
                   up95ci_cohesionz_tdist = mean_cohesionz + (t_cohesionz * sem_cohesionz),
                   low95ci_cohesionz_tdist = mean_cohesionz - (t_cohesionz * sem_cohesionz),
                   na_cohesion = sum(is.na(soc_cohesion)),
                   
                   #Social Control:
                   mean_control = mean(soc_control, na.rm = TRUE),
                   sd_control = sd(soc_control, na.rm = TRUE), 
                   n_control = sum(!is.na(soc_control)),
                   sem_control = sd_control / sqrt(n_control),
                   t_control = ifelse(n_control > 1, qt(0.975, df = n_control - 1), NA_real_),
                   up95ci_control_tdist = mean_control + (t_control * sem_control),
                   low95ci_control_tdist = mean_control - (t_control * sem_control),
                   mean_controlz = mean(soc_control_z, na.rm = TRUE),
                   sd_controlz = sd(soc_control_z, na.rm = TRUE),
                   sem_controlz = sd_controlz / sqrt(n_control),
                   t_controlz = ifelse(n_control > 1, qt(0.975, df = n_control - 1), NA_real_),
                   up95ci_controlz_tdist = mean_controlz + (t_controlz * sem_controlz),
                   low95ci_controlz_tdist = mean_controlz - (t_controlz * sem_controlz),
                   na_control = sum(is.na(soc_control)),
                   
                   #Collective Efficacy:
                   mean_coleff = mean(coleff, na.rm = TRUE),
                   sd_coleff = sd(coleff, na.rm = TRUE),
                   n_coleff = sum(!is.na(coleff)),
                   sem_coleff = sd_coleff / sqrt(n_coleff),
                   t_coleff = ifelse(n_coleff > 1, qt(0.975, df = n_coleff - 1), NA_real_),
                   up95ci_coleff_tdist = mean_coleff + (t_coleff * sem_coleff),
                   low95ci_coleff_tdist = mean_coleff - (t_coleff * sem_coleff),
                   mean_coleffz = mean(coleff_z, na.rm = TRUE),
                   sd_coleffz = sd(coleff_z, na.rm = TRUE),
                   sem_coleffz = sd_coleffz / sqrt(n_coleff),
                   t_coleffz = ifelse(n_coleff > 1, qt(0.975, df = n_coleff - 1), NA_real_),
                   up95ci_coleffz_tdist = mean_coleffz + (t_coleffz * sem_coleffz),
                   low95ci_coleffz_tdist = mean_coleffz - (t_coleffz * sem_coleffz),
                   na_coleff = sum(is.na(coleff)),
                   
                   #Perceived Violence:
                   mean_percviol = mean(percviol, na.rm = TRUE),
                   sd_percviol = sd(percviol, na.rm = TRUE),
                   n_percviol = sum(!is.na(percviol)),
                   sem_percviol = sd_percviol / sqrt(n_percviol),
                   t_percviol = ifelse(n_percviol > 1, qt(0.975, df = n_percviol - 1), NA_real_),
                   up95ci_percviol_tdist = mean_percviol + (t_percviol * sem_percviol),
                   low95ci_percviol_tdist = mean_percviol - (t_percviol * sem_percviol),
                   na_percviol = sum(is.na(percviol)),
                   
                   #Experienced Criminal Vic:
                   mean_expcrime = mean(expcrime, na.rm = TRUE),
                   sd_expcrime = sd(expcrime, na.rm = TRUE),
                   n_expcrime = sum(!is.na(expcrime)),
                   sem_expcrime = sd_expcrime / sqrt(n_expcrime),
                   t_expcrime = ifelse(n_expcrime > 1, qt(0.975, df = n_expcrime - 1), NA_real_),
                   up95ci_expcrime_tdist = mean_expcrime + (t_expcrime * sem_expcrime),
                   low95ci_expcrime_tdist = mean_expcrime - (t_expcrime * sem_expcrime),
                   na_expcrime = sum(is.na(expcrime))
                   ) %>%
  left_join(kc_combsurv_geoid) 
Show code
kc_combsurv_agganal %>%
  select(NBHD, n, mean_cohesion, na_cohesion, mean_control, na_control, mean_coleff, na_coleff,
         mean_percviol, na_percviol, mean_expcrime, na_expcrime) %>%
  zap_label %>%
  gt() %>%
  fmt_number(
    columns = c(mean_cohesion, mean_control, mean_coleff, mean_percviol, mean_expcrime),
    decimals = 2) %>%
  cols_label(
    NBHD = "Neighborhood",
    n = "Total Observations",
    mean_cohesion = "Mean Soc. Cohesion",
    na_cohesion = "Missing Soc. Cohesion",
    mean_control = "Mean Soc. Control",
    na_control = "Missing Soc. Control",
    mean_coleff = "Mean Col. Efficacy",
    na_coleff = "Missing Col. Efficacy",
    mean_percviol = "Mean Perc. Violence",
    na_percviol = "Missing Perc. Violence",
    mean_expcrime = "Mean Exp. Victimization",
    na_expcrime = "Missing Exp. Victimization"
  ) %>%
  tab_header(
    title = "Neighborhood Survey Data Summary") %>%
   tab_options(
     table.width = pct(80),
     container.height = px(500),
     container.overflow.y = TRUE)
Neighborhood Survey Data Summary
Neighborhood Total Observations Mean Soc. Cohesion Missing Soc. Cohesion Mean Soc. Control Missing Soc. Control Mean Col. Efficacy Missing Col. Efficacy Mean Perc. Violence Missing Perc. Violence Mean Exp. Victimization Missing Exp. Victimization
1 9 4.15 5 4.70 3 4.42 5 0.06 2 0.33 3
2 11 3.87 2 4.95 3 4.47 3 0.09 2 0.36 0
3 8 3.60 0 4.94 1 4.29 1 0.07 2 0.25 0
5 6 3.87 0 4.73 0 4.30 0 0.10 0 0.00 0
6 7 4.03 1 4.77 1 4.78 2 0.06 0 0.29 0
7 7 3.86 0 4.94 0 4.40 0 0.00 2 0.00 0
8 7 3.52 2 4.12 2 3.82 2 0.14 0 0.86 0
9 2 3.40 0 4.90 0 4.15 0 0.00 0 0.50 0
10 3 4.13 0 4.30 1 4.40 1 0.13 0 0.00 0
11 14 3.36 0 3.82 3 3.51 3 0.93 5 1.92 1
12 23 4.08 0 4.63 1 4.35 1 0.17 4 1.52 2
13 9 3.33 0 4.07 3 3.67 3 0.50 1 0.50 1
14 25 2.73 2 3.70 7 3.22 7 1.20 11 1.59 3
15 5 4.10 1 4.20 1 4.15 1 0.50 3 0.75 1
16 13 3.65 1 4.13 1 3.89 1 0.52 3 1.50 1
17 10 3.98 0 4.42 1 4.24 1 0.42 2 1.78 1
18 7 4.20 0 4.36 2 4.34 2 0.97 1 1.43 0
19 8 3.40 2 4.51 1 3.90 2 0.09 1 0.62 0
21 21 2.99 4 3.31 5 3.21 7 0.51 10 0.60 1
22 2 NaN 2 NaN 2 NaN 2 NaN 2 0.00 0
23 3 3.53 0 4.40 0 3.97 0 0.20 2 0.00 1
24 1 2.60 0 2.20 0 2.40 0 2.20 0 2.00 0
25 7 3.50 3 3.77 1 3.85 3 0.10 3 0.50 1
26 17 4.23 2 4.79 3 4.51 4 0.16 7 1.00 2
27 10 3.80 1 4.52 0 4.18 1 0.06 3 0.50 0
28 11 3.90 1 4.42 2 4.40 3 0.30 3 1.09 0
29 14 4.44 0 4.89 3 4.65 3 0.44 3 1.42 2
30 2 3.80 0 3.60 1 3.60 1 0.90 0 1.00 0
31 5 3.47 2 4.40 1 4.07 2 0.47 2 1.00 1
33 18 3.59 2 3.74 5 3.76 7 0.36 7 1.29 1
34 20 3.58 1 4.30 2 4.03 3 0.14 7 1.05 1
35 10 3.38 1 3.25 6 3.30 6 0.20 2 1.10 0
36 10 3.76 1 3.67 4 3.77 4 0.60 4 1.50 0
37 3 3.93 0 4.20 1 4.15 1 0.00 1 1.33 0
38 10 3.26 0 3.73 4 3.65 4 0.56 5 1.00 2
39 8 3.77 2 3.87 5 3.57 5 0.05 4 1.12 0
40 4 3.55 0 3.67 1 3.67 1 0.13 1 0.00 0
41 13 3.92 1 4.36 4 4.09 4 0.28 3 1.23 0
42 9 3.17 1 3.70 3 3.42 3 0.63 3 0.86 2
43 13 3.38 2 3.90 3 3.60 4 0.38 4 0.50 1


The above table presents the neighborhood-level means for the key scales we constructed and analyzed in the previous chapters.1 We also added the total number of observations and total number of missing observations for each scale. Comparing these rows (and subtracting the missing from total) will give you a sense of the effective (sub)sample size from which each of these means were calculated. Neighborhoods with fewer effective observations will generally produce noisier mean values than those with more effective observations.

We can help build this intuition by plotting these mean values with their 95% confidence intervals based on a t-distribution.2 Additionally, we will show semi-transparent data points and box plots to visualize the degree of variation in individual responses used to comprise each aggregated neighborhood-level “observation.”

Show code
library(ggplot2)
library(dplyr)
library(rlang) # For {{ }} operator

### Neighborhood Plot Function
nbhd_plot <- function(
    data_agg,                   # Aggregated dataset (e.g., kc_combsurv_agganal)
    data_ind,                   # Indivdiual dataset for hline (e.g., kc_combsurv_ceanal)
    y_var,                      # Main y-variable (e.g., mean_cohesion)
    ymin_var,                   # Lower CI variable (e.g., low95ci_cohesion_tdist)
    ymax_var,                   # Upper CI variable (e.g., up95ci_cohesion_tdist)
    filter_na_var,              # Variable to check for NAs before filtering (e.g., sem_cohesion)
    hline_var_ind,              # Variable in raw data for hline (e.g., soc_cohesion)
    plot_title = NULL,                 # Title for the plot
    group_var = NBHD,           # Grouping variable for x-axis (defaults to NBHD)
    order_var = n,              # Variable to order by on x-axis (defaults to n)
    point_color = "black",  # Color for geom_pointrange
    y_axis_breaks = NULL,
    y_coord_limits = NULL,
    x_axis_title = NULL
) {

  # Prepare data for plotting (filtering and creating the n_lookup)
  plot_data <- data_agg %>%
    filter(!is.na({{ filter_na_var }}))

  # n_lookup needs to be created from the data that will be used in ggplot
  # and specifically from the 'group_var' and 'order_var' columns.
  # We need to ensure 'order_var' is treated as a symbol for creating the named vector.
  # This assumes 'group_var' and 'order_var' exist in 'plot_data'.
  # If 'order_var' is 'n', then this works.
  n_lookup_dynamic <- setNames(
    plot_data[[as_name(enquo(order_var))]], # Get the column specified by order_var
    plot_data[[as_name(enquo(group_var))]]  # Get the column specified by group_var
  )

  ggplot(plot_data, aes(x = reorder({{ group_var }}, {{ order_var }}), y = {{ y_var }})) +
    geom_pointrange(aes(ymin = {{ ymin_var }}, ymax = {{ ymax_var }}), color = point_color) +
    geom_hline(data = data_ind, aes(yintercept = mean({{ hline_var_ind }}, na.rm = TRUE)),
               linetype = 2) +
    scale_y_continuous(breaks = y_axis_breaks) +
    coord_cartesian(ylim = y_coord_limits) +
    scale_x_discrete(
      labels = function(x_breaks) {
        # x_breaks are the actual values from the 'group_var' column in their plotted order
        as.character(n_lookup_dynamic[x_breaks])
      },
      name = x_axis_title
    ) +
    theme_minimal(base_size = 10) +
    labs(title = plot_title) +
    theme(
      panel.grid = ggplot2::element_blank(),
      axis.title.y = ggplot2::element_blank()
    )
}

### Neighborhood Box Plot Function with Data Points & Mean Overlay
nbhd_box_plot <- function(
    data_agg,                   # Aggregated dataset (for means and CIs)
    data_ind,                   # Individual dataset for box plots and points
    y_var_ind,                  # Individual-level y-variable (e.g., soc_cohesion)
    y_var_agg,                  # Aggregated mean variable (e.g., mean_cohesion)
    ymin_var,                   # Lower CI variable (e.g., low95ci_cohesion_tdist)
    ymax_var,                   # Upper CI variable (e.g., up95ci_cohesion_tdist)
    filter_na_var,              # Variable to check for NAs before filtering
    hline_var_ind,              # Variable in raw data for hline
    plot_title = NULL,
    group_var = NBHD,
    order_var = n,
    box_color = "black",        # Box outline color
    box_alpha = 0.8,            # Box outline transparency
    point_color = "gray60",
    point_alpha = 0.4,
    point_size = 0.8,
    y_axis_breaks = NULL,
    y_coord_limits = NULL,
    x_axis_title = NULL
) {

  # Prepare aggregated data for ordering and means
  agg_data <- data_agg %>%
    filter(!is.na({{ filter_na_var }}))
  
  # Prepare individual data, filtering to neighborhoods in agg_data
  ind_data <- data_ind %>%
    filter({{ group_var }} %in% agg_data[[as_name(enquo(group_var))]]) %>%
    filter(!is.na({{ y_var_ind }}))
  
  # Create n_lookup for x-axis labels
  n_lookup_dynamic <- setNames(
    agg_data[[as_name(enquo(order_var))]],
    agg_data[[as_name(enquo(group_var))]]
  )
  
  # Get neighborhood order from aggregated data
  nbhd_order <- agg_data %>%
    arrange({{ order_var }}) %>%
    pull({{ group_var }})
  
  # Create the plot
  ggplot() +
    
    # Semi-transparent individual data points
    geom_point(data = ind_data,
               aes(x = factor({{ group_var }}, levels = nbhd_order), 
                   y = {{ y_var_ind }}),
               color = point_color, alpha = point_alpha, size = point_size,
               position = position_jitter(width = 0.2, height = 0)) +
    
    # Box plots with transparent fill
    geom_boxplot(data = ind_data,
                 aes(x = factor({{ group_var }}, levels = nbhd_order), 
                     y = {{ y_var_ind }}),
                 fill = "transparent", 
                 color = alpha(box_color, box_alpha),
                 outlier.shape = NA) +
    
    # Point-interval overlay (same color as box)
    geom_pointrange(data = agg_data,
                    aes(x = factor({{ group_var }}, levels = nbhd_order),
                        y = {{ y_var_agg }},
                        ymin = {{ ymin_var }}, 
                        ymax = {{ ymax_var }}),
                    color = box_color) +  # Same color as box outline
    
    # Overall mean horizontal line
    geom_hline(data = data_ind,
               aes(yintercept = mean({{ hline_var_ind }}, na.rm = TRUE)),
               linetype = 2) +
    
    scale_y_continuous(breaks = y_axis_breaks) +
    coord_cartesian(ylim = y_coord_limits) +
    scale_x_discrete(
      labels = function(x_breaks) {
        as.character(n_lookup_dynamic[x_breaks])
      },
      name = x_axis_title
    ) +
    theme_minimal(base_size = 10) +
    labs(title = plot_title) +
    theme(
      panel.grid = element_blank(),
      axis.title.y = element_blank()
    )
}
Show code
# Social Cohesion 
# nbhd_plot(data_agg = kc_combsurv_agganal,
#           data_ind = kc_combsurv_ceanal,
#           y_var = mean_cohesion,
#           ymin_var = low95ci_cohesion_tdist,
#           ymax_var = up95ci_cohesion_tdist,
#           filter_na_var = sem_cohesion,
#           hline_var_ind = soc_cohesion,
#           plot_title = "Neighborhood-Level Social Cohesion",
#           group_var = NBHD,
#           order_var = n,
#           point_color = "#440154FF",
#           y_axis_breaks = c(1,2,3,4,5),
#           y_coord_limits = c(0.5, 5.5),
#           x_axis_title = "Sample Size (n)"
# )

# Social Cohesion with mean overlay
nbhd_box_plot(
  data_agg = kc_combsurv_agganal, 
  data_ind = kc_combsurv_ceanal,
  y_var_ind = soc_cohesion,
  y_var_agg = mean_cohesion,
  ymin_var = low95ci_cohesion_tdist,
  ymax_var = up95ci_cohesion_tdist,
  filter_na_var = sem_cohesion,
  hline_var_ind = soc_cohesion,
  plot_title = "Neighborhood-Level Social Cohesion",
  box_color = "#440154FF",      # Same color for box and point-interval
  box_alpha = .2,
  point_color = "#440154FF",
  point_alpha = 0.3,
  y_axis_breaks = c(1,2,3,4,5),
  y_coord_limits = c(0.5, 5.5),
  x_axis_title = "Sample Size (n)"
)

Show code
# nbhd_plot(data_agg = kc_combsurv_agganal, 
#           data_ind = kc_combsurv_ceanal,
#           y_var = mean_control,
#           ymin_var = low95ci_control_tdist,
#           ymax_var = up95ci_control_tdist,
#           filter_na_var = sem_control,
#           hline_var_ind = soc_control,
#           plot_title = "Neighborhood-Level Social Control",
#           group_var = NBHD,
#           order_var = n,
#           point_color = "#FDE725FF",
#           y_axis_breaks = c(1,2,3,4,5,6),
#           y_coord_limits = c(0.5, 6.5),
#           x_axis_title = "Sample Size (n)"
# )

nbhd_box_plot(
  data_agg = kc_combsurv_agganal, 
  data_ind = kc_combsurv_ceanal,
  y_var_ind = soc_control,
  y_var_agg = mean_control,
  ymin_var = low95ci_control_tdist,
  ymax_var = up95ci_control_tdist,
  filter_na_var = sem_control,
  hline_var_ind = soc_control,
  plot_title = "Neighborhood-Level Social Control",
  box_color = "#FDE725FF",
  box_alpha = 0.2,
  point_color = "#FDE725FF",
  point_alpha = 0.3,
  y_axis_breaks = c(1,2,3,4,5,6),
  y_coord_limits = c(0.5, 6.5),
  x_axis_title = "Sample Size (n)"
)

Show code
# nbhd_plot(data_agg = kc_combsurv_agganal, 
#           data_ind = kc_combsurv_ceanal,
#           y_var = mean_coleffz,
#           ymin_var = low95ci_coleffz_tdist,
#           ymax_var = up95ci_coleffz_tdist,
#           filter_na_var = sem_coleffz,
#           hline_var_ind = coleff_z,
#           plot_title = "Neighborhood-Level Collective Efficacy",
#           group_var = NBHD,
#           order_var = n,
#           point_color = "#21908CFF",
#           y_axis_breaks = c(-4,-3,-2,-1,0,1,2),
#           y_coord_limits = c(-4, 2),
#           x_axis_title = "Sample Size (n)"
# )

nbhd_box_plot(
  data_agg = kc_combsurv_agganal, 
  data_ind = kc_combsurv_ceanal,
  y_var_ind = coleff_z,
  y_var_agg = mean_coleffz,
  ymin_var = low95ci_coleffz_tdist,
  ymax_var = up95ci_coleffz_tdist,
  filter_na_var = sem_coleffz,
  hline_var_ind = coleff_z,
  plot_title = "Neighborhood-Level Collective Efficacy (stdz)",
  box_color = "#21908CFF",
  box_alpha = 0.2,
  point_color = "#21908CFF",
  point_alpha = 0.3,
  y_axis_breaks = c(-4,-3,-2,-1,0,1,2),
  y_coord_limits = c(-4, 2),
  x_axis_title = "Sample Size (n)"
)

Show code
# nbhd_plot(data_agg = kc_combsurv_agganal, 
#           data_ind = kc_combsurv_ceanal,
#           y_var = mean_percviol,
#           ymin_var = low95ci_percviol_tdist,
#           ymax_var = up95ci_percviol_tdist,
#           filter_na_var = sem_percviol,
#           hline_var_ind = percviol,
#           plot_title = "Neighborhood-Level Perceived Violence",
#           group_var = NBHD,
#           order_var = n,
#           point_color = "#5DC863FF",
#           y_axis_breaks = c(0,1,2,3),
#           y_coord_limits = c(-0.5, 3.5),
#           x_axis_title = "Sample Size (n)"
# )

nbhd_box_plot(
  data_agg = kc_combsurv_agganal, 
  data_ind = kc_combsurv_ceanal,
  y_var_ind = percviol,
  y_var_agg = mean_percviol,
  ymin_var = low95ci_percviol_tdist,
  ymax_var = up95ci_percviol_tdist,
  filter_na_var = sem_percviol,
  hline_var_ind = percviol,
  plot_title = "Neighborhood-Level Perceived Violence",
  box_color = "#5DC863FF",
  box_alpha = 0.2,
  point_color = "#5DC863FF",
  point_alpha = 0.3,
  y_axis_breaks = c(0,1,2,3),
  y_coord_limits = c(-0.5, 3.5),
  x_axis_title = "Sample Size (n)"
)

Show code
# nbhd_plot(data_agg = kc_combsurv_agganal, 
#           data_ind = kc_combsurv_ceanal,
#           y_var = mean_expcrime,
#           ymin_var = low95ci_expcrime_tdist,
#           ymax_var = up95ci_expcrime_tdist,
#           filter_na_var = sem_expcrime,
#           hline_var_ind = expcrime,
#           plot_title = "Neighborhood-Level Victimization",
#           group_var = NBHD,
#           order_var = n,
#           point_color = "#3B528BFF",
#           y_axis_breaks = c(0,1,2,3,4),
#           y_coord_limits = c(-0.5, 4.5),
#           x_axis_title = "Sample Size (n)"
# )

nbhd_box_plot(
  data_agg = kc_combsurv_agganal, 
  data_ind = kc_combsurv_ceanal,
  y_var_ind = expcrime,
  y_var_agg = mean_expcrime,
  ymin_var = low95ci_expcrime_tdist,
  ymax_var = up95ci_expcrime_tdist,
  filter_na_var = sem_expcrime,
  hline_var_ind = expcrime,
  plot_title = "Neighborhood-Level Victimization",
  box_color = "#3B528BFF",
  box_alpha = 0.2,
  point_color = "#3B528BFF",
  point_alpha = 0.3,
  y_axis_breaks = c(0,1,2,3,4),
  y_coord_limits = c(-0.5, 4.5),
  x_axis_title = "Sample Size (n)"
)

In the above plots, the point-intervals clearly show the inverse relationship between sub-sample size and uncertainty in estimates as expected, with extremely wide 95% CIs for aggregated neighborhood estimates comprised of a few responses (left side of each plot) that tend to narrow substantially as the number of observations per neighborhood increases.

With that said, the semi-transparent data point spreads and box plot widths behind the point-intervals also shows that the increased “certainty” presumably gained via larger sample sizes masks substantial within-neighborhood heterogeneity in individual responses. You can also see some of the limits in trying to estimate uncertainty from the observed data, especially with the neighborhood-level victimization. For some small sub-samples, (e.g., n = 2 - 7), everyone surveyed in that neighborhood reported no experiences with criminal victimization and thus, there are no uncertainty estimates. We could estimate a simple multilevel model to get more reasonable (and more conservative) uncertainty estimates, but we’ll forgo that for now in order to actually map these values.

6.0.5 Mapping Survey Data

Since the aggregate survey data above only has the sampled neighborhoods, we first need to merge the data with the full crime date data.

Show code
kc_combsurv_agganal_sf <- kc_combsurv_agganal %>%
  full_join(kc_combsurv_geoid) %>%
  st_sf()

We can start by plotting some basic descriptive maps. First, let’s plot the number of respondents for each neighborhood sampled.

Show code
nsamp_pal <- colorNumeric(palette = "viridis",
                          domain = c(1, 25),
                          reverse = TRUE,
                          na.color = "white")

# Get the bounding box of your data

surv_response_map <- 
leaflet(kc_combsurv_agganal_sf) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%  # Add Positron map tiles
  setView(lng = -94.5786, lat = 39.0997, zoom = 10) %>%  # KC coordinates  
  addPolygons(
    fillColor = ~nsamp_pal(pmin(n, 25)),
    weight = 1,
    opacity = 1,
    color = "black",
    dashArray = "",
    fillOpacity = 0.5,
    highlight = highlightOptions(
      weight = 3,
      color = "#666",
      dashArray = "",
      fillOpacity = 0.5,
      bringToFront = TRUE),
    popup = ~paste("<b>Neighborhood Information</b>",
                   "<br>Neighborhood:", NBHNAME,
                   "<br>Population (2020):", SUM_POP20,
                   "<br>Sample:", sample_fact,
                   "<br>Violent Crime (per 1,000):", VC_1000_Rate,
                   "<br>Property Crime (per 1,000):", PC_1000_Rate,
                   "<br>Total Crime (per 1,000):", Crime_1000_Rate,
                   "<hr><b>Survey Information</b>",
                   "<br>Survey Respondents:", n
                   ),
    label = ~NBHNAME
  ) %>%
  # fitBounds(lng1 = bbox[1], lat1 = bbox[2], 
  #           lng2 = bbox[3], lat2 = bbox[4]) %>%  
  addLegend(pal = nsamp_pal, 
            values = c(1, 25), 
            opacity = 0.7, 
            title = "Survey Respondents",
            position = "bottomright")

surv_response_map