In the previous chapter, we analyzed bivariate relationships between item- and scale-level measures of collective efficacy and perceived and experienced criminal behaviors in residents’ neighborhoods. We found that individuals who perceive their neighborhood to be high in collective efficacy (at least for certain items meant to measure the key sub-dimensions of collective efficacy–social cohesion and informal social control) generally perceived less crime in their neighborhood and, to a lesser extent, report fewer direct experiences with criminal victimization. Of course, the theory of collective efficacy was not developed to explain individual-level differences in neighborhood perceptions and crime. Rather, it was conceived as a collective neighborhood-level corollary to the classic psychological concept of self-efficacy. Instead of emphasizing an individual’s belief in their ability to succeed in specific situations or accomplish a specific task, collective efficacy is meant to capture a community or neighborhood’s collective “mutual trust or willingness to intervene for the common good…(Sampson et al., 1997: p. 919)” such as controlling crime and antisocial behavior. With this in mind, we turn to visualizing neighborhood-level variations in collective efficacy and crime below.
6.0.2 Load the Data
First, we need to load the crime rate data with geographic information for mapping and the survey data with a neighborhood identifier that corresponds to the neighborhoods in the crime rate data (see Appendix II section 11.0.1). We will also load the pre-processed analytic data from the prior chapter that includes our collective efficacy and crime-related items, sub-scales, and scales.
We can start by mapping the geographic characteristics of the data, including the neighborhoods that were sampled. The sf package is designed to work with a host of different plotting packages. While the sf package has some built-in plotting features and we typically prefer the generality of ggplot2 for making plots in R, the “leaflet” package seems to provide access to one of the most all-purpose mapping tools. So we’ll use that below to allow for some degree of interactivity.
First, we can map whether the neighborhood was sampled and whether it was part of the random or high crime sample.
In the above plot, hovering over the neighborhoods will reveal their names and clicking on the neighborhoods will reveal additional information, including the population from the 2020 census, whether it was sampled and which sample if so (random; high violent crime), and the crime rate (violent, property, and total) in 2024. You will notice that neighborhoods from the “High Crime Sample” are clustered around the geographic center of the city and just south of the central business district (Missouri River to the north and 31st street to the south).
Although redundant with information Dr. Kotlaja already has, we can also map the crime rate data as well. We present this in separate tabs below. We have capped the maximum values to be a round number that is close to 2 standard deviations above the mean crime rate for all crime, violent crime, and property crime respectively.
The charts above largely reproduce the story evident in the sampling map. Neighborhoods with higher crime are generally concentrated just south of the central business district. Of course, since the “high crime” sample was selected based on violent crime rates, the “high crime” sample neighborhoods correspondingly includes all but a few of the neighborhoods with the highest violent crime rates (notable exceptions not sampled include Hospital Hill, Northeast Industrial District, and Blue Valley Industrial neighborhoods).
6.0.4 Aggregating Survey Data
The above maps are simply plotting administrative data. More relevant to our purposes is to map the aggregated survey data on collective efficacy and perceived crime and experienced victimization to the neighborhood level. To do this, we first need to aggregate the individual-level data to the neighborhood level and map them similar to how we did above. But first, before reinforcing the formal neighborhood boundaries as defined by the Kansas City government, it is worth examining what survey respondents identified as their neighborhood.
6.0.4.1 Residents’ Perceived Neighborhoods
As Dr. Kotlaja anticipated, residents do not necessarily identify their neighborhood as the official neighborhood boundaries. A simple way to assess this is by identifying all unique combinations of the NBHD and Q1_1_Text (self-reported neighborhood name) variables in the survey data and then comparing alongside the official neighborhood names from the crime rate data.
North Lakes is the subdivision but I think maybe Tiffany Springs is the area?
221
The Coves
294
7
The Coves
221
The Coves
168
8
Woodfield
205
Ravenwood-Somerset
171
8
Ravenwood Somerset
205
Ravenwood-Somerset
256
8
Somerset
205
Ravenwood-Somerset
295
8
Ravenwood-Somerset
205
Ravenwood-Somerset
296
8
Ravenwood-Sommerset
205
Ravenwood-Somerset
298
8
Carriage Hill Estates
205
Ravenwood-Somerset
299
9
Maple Park
210
Maple Park
301
10
winnwood gardens
207
Winnwood Gardens
302
10
Winnwood Gardens
207
Winnwood Gardens
37
11
Lykins
17
Lykins
51
11
Northeast
17
Lykins
182
11
17
Lykins
211
11
Old Northeast
17
Lykins
237
11
Pendleton heights
17
Lykins
242
11
Lykins or ne
17
Lykins
20
12
Longfellow
12
Longfellow
22
12
12
Longfellow
23
12
Longfellow heights
12
Longfellow
222
12
Lomgfellow
12
Longfellow
34
13
62
Palestine East
36
13
Palestine
62
Palestine East
225
13
East Palestine
62
Palestine East
229
13
The 30s
62
Palestine East
308
13
Palestine East
62
Palestine East
310
13
Palestin area
62
Palestine East
1
14
Blue Valley
31
South Blue Valley
2
14
31
South Blue Valley
3
14
Vanbrunt
31
South Blue Valley
6
14
Blue valley
31
South Blue Valley
95
14
Van brunt
31
South Blue Valley
152
14
East side
31
South Blue Valley
316
14
South Blue Valley
31
South Blue Valley
44
15
60
Oak Park Southeast
250
15
Oak Park
60
Oak Park Southeast
318
15
Oak park
60
Oak Park Southeast
4
16
76
North Hyde Park
35
16
Hyde park
76
North Hyde Park
107
16
North Hyde park
76
North Hyde Park
145
16
Hyde Park
76
North Hyde Park
177
16
North Hyde Park
76
North Hyde Park
320
16
North hyde park
76
North Hyde Park
321
16
North Hydepark
76
North Hyde Park
17
17
Art institute/ south Moreland
80
Southmoreland
46
17
Southmoreland
80
Southmoreland
50
17
Souhmoreland
80
Southmoreland
123
17
South Moreland / Westport
80
Southmoreland
164
17
Westport
80
Southmoreland
45
18
River Market
2
River Market
106
18
River market
2
River Market
322
18
2
River Market
324
18
Market Station
2
River Market
52
19
Creek wood commons
192
Davidson
169
19
South Oakwood
192
Davidson
238
19
Williamsburg
192
Davidson
326
19
Cooley Highlands
192
Davidson
327
19
192
Davidson
330
19
Creekwood
192
Davidson
10
21
Ruskin Heights
171
Ruskin Heights
60
21
171
Ruskin Heights
64
21
Ruskin heights
171
Ruskin Heights
85
21
Ruskin
171
Ruskin Heights
210
21
Ruskin heights/ Hickman mills area
171
Ruskin Heights
259
21
Terrace
171
Ruskin Heights
260
21
Off manchester
171
Ruskin Heights
273
22
128
East Meyer 6
276
22
Roseville
128
East Meyer 6
269
23
131
Brown Estates
270
23
Brown estates
131
Brown Estates
154
24
178
Little Blue
9
25
Hickman Mills
169
Hickman Mills South
83
25
Hickman mills south
169
Hickman Mills South
84
25
169
Hickman Mills South
332
25
holiday hills
169
Hickman Mills South
7
26
Waldo
106
Tower Homes
62
26
Tower park
106
Tower Homes
129
26
Rock hill garden
106
Tower Homes
138
26
106
Tower Homes
150
26
Tower Homes
106
Tower Homes
186
26
Tower
106
Tower Homes
202
26
Tower homes
106
Tower Homes
209
26
Tower Park
106
Tower Homes
253
26
Rockhill Gardens
106
Tower Homes
15
27
Linden Hills
142
Linden Hills And Indian Heights
29
27
Linden hills
142
Linden Hills And Indian Heights
76
27
Linden Hill
142
Linden Hills And Indian Heights
146
27
Linden Hiils
142
Linden Hills And Indian Heights
181
27
Linden Hills and Indian heights
142
Linden Hills And Indian Heights
67
28
Waldo Homes
110
Waldo Homes
68
28
Waldo
110
Waldo Homes
142
28
110
Waldo Homes
335
28
Rock hill Gardens
110
Waldo Homes
14
29
Marlboro Height
95
Morningside
53
29
Morningside
95
Morningside
54
29
Wornell Homestead
95
Morningside
56
29
Morninside
95
Morningside
338
29
Brookside
95
Morningside
342
29
Morningside Neighborhood Is our neighborhood association
95
Morningside
43
30
186
Richards Gebaur
90
30
Grandview
186
Richards Gebaur
11
31
Paseo West
5
Paseo West
12
31
5
Paseo West
89
31
West Paseo
5
Paseo West
272
31
Rosehill Townhomes
5
Paseo West
345
31
Paseowest
5
Paseo West
24
33
key coalition
56
Key Coalition
25
33
Key Coalition
56
Key Coalition
26
33
Key Coaltion
56
Key Coalition
27
33
Not that I know of
56
Key Coalition
28
33
Key coalition
56
Key Coalition
109
33
Ivanhoe
56
Key Coalition
161
33
56
Key Coalition
215
33
Spring Hill
56
Key Coalition
16
34
Ivanhoe ne
55
Ivanhoe Northeast
39
34
55
Ivanhoe Northeast
40
34
Ivanhoe Gardens
55
Ivanhoe Northeast
41
34
Ivanhoe
55
Ivanhoe Northeast
233
34
Ivahoe
55
Ivanhoe Northeast
38
35
Dunbar gardens
66
Dunbar
108
35
Dunbar
66
Dunbar
151
35
66
Dunbar
348
35
Leeds
66
Dunbar
48
36
81
Old Westport
57
36
Valentine
81
Old Westport
112
36
Westport
81
Old Westport
114
36
Westport - nbhd org. name is Heart of Westport
81
Old Westport
258
36
Westport area
81
Old Westport
351
36
WESTPORT ENTERTAINMENT DISTRICT
81
Old Westport
352
37
Mount hope
51
Mount Hope
353
37
Boston Heights/Mount Hope
51
Mount Hope
354
37
I think it’s Mt. Hope
51
Mount Hope
5
38
Ivanhoe
54
Ivanhoe Southeast
199
38
Ivanhoe Neighborhood
54
Ivanhoe Southeast
235
38
Ivanhoe Se
54
Ivanhoe Southeast
252
38
54
Ivanhoe Southeast
130
39
134
East Swope Highlands
131
39
Manchester
134
East Swope Highlands
355
39
East Swope Highlands
134
East Swope Highlands
356
39
East Meyer
134
East Swope Highlands
277
40
122
Blenheim Square Research Hospital
8
41
Union Hill
11
Union Hill
30
41
Union hill
11
Union Hill
110
41
Ivanhoe, olive st.
11
Union Hill
227
41
The 30s
11
Union Hill
120
42
59
Oak Park Southwest
165
42
Oak park
59
Oak Park Southwest
366
42
Oak park southwest
59
Oak Park Southwest
368
42
Ivanhoe
59
Oak Park Southwest
370
43
Rosehill
NA
NA
371
43
NA
NA
372
43
Bristol Park
NA
NA
377
43
Pembrooke Estates
NA
NA
379
43
Brandon mosley
NA
NA
380
43
Rosehill townhomes
NA
NA
381
43
Oak crest
NA
NA
382
43
Rolling Meadows
NA
NA
383
43
Tiffany Springs
NA
NA
Scrolling through the table above offers a basic sense of the differences in how residents identify their neighborhoods. Despite substantial overlap, there also appears to be much greater variation in residents’ reports of their neighborhood compared to official boundaries as anticipated (cf. here, here, here, and here). Nonetheless, for this initial report, we will rely on the official neighborhood boundaries since they permit direct comparisons with the aggregated crime data we received.
6.0.4.2 Aggregate key measures to the Neighborhood-level
The next step is to aggregate the survey data and merge it with the crime rate data. All we need for this are the NBHD variable that indicates the neighborhoods sampled and the specific measures we want to aggregate (e.g., collective efficacy, perceived violence, and experienced victimization). In addition to aggregating the data by calculating the mean of each of our key variables, we will also calculate standard errors and confidence intervals for the mean that we will use later to build our intuition about the uncertainty of these estimates of neighborhood-level values.
The above table presents the neighborhood-level means for the key scales we constructed and analyzed in the previous chapters.1 We also added the total number of observations and total number of missing observations for each scale. Comparing these rows (and subtracting the missing from total) will give you a sense of the effective (sub)sample size from which each of these means were calculated. Neighborhoods with fewer effective observations will generally produce noisier mean values than those with more effective observations.
We can help build this intuition by plotting these mean values with their 95% confidence intervals based on a t-distribution.2 Additionally, we will show semi-transparent data points and box plots to visualize the degree of variation in individual responses used to comprise each aggregated neighborhood-level “observation.”
Show code
library(ggplot2)library(dplyr)library(rlang) # For {{ }} operator### Neighborhood Plot Functionnbhd_plot <-function( data_agg, # Aggregated dataset (e.g., kc_combsurv_agganal) data_ind, # Indivdiual dataset for hline (e.g., kc_combsurv_ceanal) y_var, # Main y-variable (e.g., mean_cohesion) ymin_var, # Lower CI variable (e.g., low95ci_cohesion_tdist) ymax_var, # Upper CI variable (e.g., up95ci_cohesion_tdist) filter_na_var, # Variable to check for NAs before filtering (e.g., sem_cohesion) hline_var_ind, # Variable in raw data for hline (e.g., soc_cohesion)plot_title =NULL, # Title for the plotgroup_var = NBHD, # Grouping variable for x-axis (defaults to NBHD)order_var = n, # Variable to order by on x-axis (defaults to n)point_color ="black", # Color for geom_pointrangey_axis_breaks =NULL,y_coord_limits =NULL,x_axis_title =NULL) {# Prepare data for plotting (filtering and creating the n_lookup) plot_data <- data_agg %>%filter(!is.na({{ filter_na_var }}))# n_lookup needs to be created from the data that will be used in ggplot# and specifically from the 'group_var' and 'order_var' columns.# We need to ensure 'order_var' is treated as a symbol for creating the named vector.# This assumes 'group_var' and 'order_var' exist in 'plot_data'.# If 'order_var' is 'n', then this works. n_lookup_dynamic <-setNames( plot_data[[as_name(enquo(order_var))]], # Get the column specified by order_var plot_data[[as_name(enquo(group_var))]] # Get the column specified by group_var )ggplot(plot_data, aes(x =reorder({{ group_var }}, {{ order_var }}), y = {{ y_var }})) +geom_pointrange(aes(ymin = {{ ymin_var }}, ymax = {{ ymax_var }}), color = point_color) +geom_hline(data = data_ind, aes(yintercept =mean({{ hline_var_ind }}, na.rm =TRUE)),linetype =2) +scale_y_continuous(breaks = y_axis_breaks) +coord_cartesian(ylim = y_coord_limits) +scale_x_discrete(labels =function(x_breaks) {# x_breaks are the actual values from the 'group_var' column in their plotted orderas.character(n_lookup_dynamic[x_breaks]) },name = x_axis_title ) +theme_minimal(base_size =10) +labs(title = plot_title) +theme(panel.grid = ggplot2::element_blank(),axis.title.y = ggplot2::element_blank() )}### Neighborhood Box Plot Function with Data Points & Mean Overlaynbhd_box_plot <-function( data_agg, # Aggregated dataset (for means and CIs) data_ind, # Individual dataset for box plots and points y_var_ind, # Individual-level y-variable (e.g., soc_cohesion) y_var_agg, # Aggregated mean variable (e.g., mean_cohesion) ymin_var, # Lower CI variable (e.g., low95ci_cohesion_tdist) ymax_var, # Upper CI variable (e.g., up95ci_cohesion_tdist) filter_na_var, # Variable to check for NAs before filtering hline_var_ind, # Variable in raw data for hlineplot_title =NULL,group_var = NBHD,order_var = n,box_color ="black", # Box outline colorbox_alpha =0.8, # Box outline transparencypoint_color ="gray60",point_alpha =0.4,point_size =0.8,y_axis_breaks =NULL,y_coord_limits =NULL,x_axis_title =NULL) {# Prepare aggregated data for ordering and means agg_data <- data_agg %>%filter(!is.na({{ filter_na_var }}))# Prepare individual data, filtering to neighborhoods in agg_data ind_data <- data_ind %>%filter({{ group_var }} %in% agg_data[[as_name(enquo(group_var))]]) %>%filter(!is.na({{ y_var_ind }}))# Create n_lookup for x-axis labels n_lookup_dynamic <-setNames( agg_data[[as_name(enquo(order_var))]], agg_data[[as_name(enquo(group_var))]] )# Get neighborhood order from aggregated data nbhd_order <- agg_data %>%arrange({{ order_var }}) %>%pull({{ group_var }})# Create the plotggplot() +# Semi-transparent individual data pointsgeom_point(data = ind_data,aes(x =factor({{ group_var }}, levels = nbhd_order), y = {{ y_var_ind }}),color = point_color, alpha = point_alpha, size = point_size,position =position_jitter(width =0.2, height =0)) +# Box plots with transparent fillgeom_boxplot(data = ind_data,aes(x =factor({{ group_var }}, levels = nbhd_order), y = {{ y_var_ind }}),fill ="transparent", color =alpha(box_color, box_alpha),outlier.shape =NA) +# Point-interval overlay (same color as box)geom_pointrange(data = agg_data,aes(x =factor({{ group_var }}, levels = nbhd_order),y = {{ y_var_agg }},ymin = {{ ymin_var }}, ymax = {{ ymax_var }}),color = box_color) +# Same color as box outline# Overall mean horizontal linegeom_hline(data = data_ind,aes(yintercept =mean({{ hline_var_ind }}, na.rm =TRUE)),linetype =2) +scale_y_continuous(breaks = y_axis_breaks) +coord_cartesian(ylim = y_coord_limits) +scale_x_discrete(labels =function(x_breaks) {as.character(n_lookup_dynamic[x_breaks]) },name = x_axis_title ) +theme_minimal(base_size =10) +labs(title = plot_title) +theme(panel.grid =element_blank(),axis.title.y =element_blank() )}
In the above plots, the point-intervals clearly show the inverse relationship between sub-sample size and uncertainty in estimates as expected, with extremely wide 95% CIs for aggregated neighborhood estimates comprised of a few responses (left side of each plot) that tend to narrow substantially as the number of observations per neighborhood increases.
With that said, the semi-transparent data point spreads and box plot widths behind the point-intervals also shows that the increased “certainty” presumably gained via larger sample sizes masks substantial within-neighborhood heterogeneity in individual responses. You can also see some of the limits in trying to estimate uncertainty from the observed data, especially with the neighborhood-level victimization. For some small sub-samples, (e.g., n = 2 - 7), everyone surveyed in that neighborhood reported no experiences with criminal victimization and thus, there are no uncertainty estimates. We could estimate a simple multilevel model to get more reasonable (and more conservative) uncertainty estimates, but we’ll forgo that for now in order to actually map these values.
6.0.5 Mapping Survey Data
Since the aggregate survey data above only has the sampled neighborhoods, we first need to merge the data with the full crime date data.