11  Appendix II - Cleaning & Wrangling the Neighborhood Data

Show code
# Load Packages
library(tidyverse)
library(easystats)
library(gt)
library(sf)
library(haven)
library(readxl)
library(here)
library(janitor)
library(viridis)
library(plotly)
library(mapview)
library(leaflet)
library(htmlwidgets)
library(biscale)
library(base64enc)
library(htmltools)

options(scipen = 999)

11.0.1 Introduction

We are going to use this appendix to document our process for cleaning and wrangling the neighborhood-level data constructed from the the individual-level community survey data (as well as official crime data). These data were used in “Chapter 6: Mapping Collective Efficacy” in order to examine the neighborhood-level variation in collective efficacy and crime.

11.0.2 Load the Data

First, we need to load the crime rate data that was in the “Data” folder of the “ArcGIS Pro Project” folder shared with us via dropbox. We will specifically use the CrimeRate Selected Neighborhood.xls data as it includes an indicator variable for the neighborhoods that were sampled. It includes SHAPE_AREA and SHAPE_LEN variables that indicate geographic shape/location features, but not the full multipolygon which may be necessary to construct a map (we’ll find out below). So we downloaded that data (in September of 2024 from DataKC. On first glance, it appeared to have the same neighborhood information as included in the CrimeRate data, but we can check that to make sure. And, of course, we’ll also load the full recoded and analytical data “Chapter 5: Measuring Collective Efficacy” that includes our collective efficacy and crime-related items, subscales, and scales.

Show code
kcmo_crimerate_raw <- read_xls(here("Data", "GeoData", "CrimeRate Selected Neighborhoods.xls"))
kcmo_geo <- read_csv(here("Data", "GeoData", "Kansas_City_Neighborhood_Borders_20240918.csv"))
kcmo_district <- read_xls(here("Data", "GeoData", "Neighborhood_CouncilDistricts_Jon & Jake.xls"))
kc_combsurv_ceanal <- readRDS(here("Data", "kc_combsurv_ceanal.rds"))
kc_combsurv_recode <- readRDS(here("Data", "kc_combsurv_recode.rds"))

11.0.3 Clean the Geography/Crime Rate Data

When loading the crime rate data, a couple things jumped out. First, there are multiple variables with the same names.1

Show code
names(kcmo_crimerate_raw)
 [1] "Selected"        "OBJECTID...2"    "OBJECTID...3"    "NBHID...4"      
 [5] "NBHNAME...5"     "SHAPE_AREA"      "SHAPE_LEN"       "OBJECTID...8"   
 [9] "NBHID...9"       "NBHNAME...10"    "RATE"            "COLOR"          
[13] "X"               "Y"               "LASTUPDATE"      "GLOBALID"       
[17] "SHAPE_Length"    "SHAPE_Area"      "OBJECTID1"       "BlockGroup"     
[21] "FREQUENCY"       "SUM_POP20"       "Property"        "Violent"        
[25] "Shape_Length"    "Shape_Area"      "TotalCrime"      "VC_1000_Rate"   
[29] "PC_1000_Rate"    "Crime_1000_Rate"

As you can see above, the kcmo_crimerate_raw data has 30 variables, many of which have the same name. The most common case of this is with three “OBJECTID…#” items (there is also an “OBJECTID1” variable) and what looks like two “NBHID…#” and “NBHNAME…#” variables. Fixing the names is generally an easy fix (we can simply remove the duplicate columns). But before we do that, we want to make sure columns with the same name do indeed have the same data.

Show code
head(kcmo_crimerate_raw) %>%
  gt() %>%
  tab_options(
  container.height = px(500),
  container.overflow.y = TRUE)
Selected OBJECTID...2 OBJECTID...3 NBHID...4 NBHNAME...5 SHAPE_AREA SHAPE_LEN OBJECTID...8 NBHID...9 NBHNAME...10 RATE COLOR X Y LASTUPDATE GLOBALID SHAPE_Length SHAPE_Area OBJECTID1 BlockGroup FREQUENCY SUM_POP20 Property Violent Shape_Length Shape_Area TotalCrime VC_1000_Rate PC_1000_Rate Crime_1000_Rate
NA 21 79 180 Glen Lake 1157349 4401.839 79 180 Glen Lake NA NA NA NA NA {14443097-6C3A-4850-9411-C008FBE487E3} 4401.839 1157349 75 Glen Lake 3 183 0 0 0.01402462 0.00001118447 0 0 0.00 0.00
NA 230 219 150 Verona Hills 20078778 22570.071 219 150 Verona Hills NA NA NA NA NA {A0E56E45-B652-4D40-ABA4-44994EF237DC} 22570.071 20078778 221 Verona Hills 42 2088 15 0 0.06955936 0.00019376292 15 0 7.18 7.18
NA 60 19 185 Blue Ridge Farms 39507133 29419.832 19 185 Blue Ridge Farms NA NA NA NA NA {26AE24F9-2EDE-421A-9426-851FD50D9BB9} 29419.832 39507133 14 Blue Ridge Farms 10 436 4 0 0.09173668 0.00038109110 4 0 9.17 9.17
NA 141 244 153 Woodbridge 6644703 10602.391 244 153 Woodbridge NA NA NA NA NA {22A79545-9E48-4876-9AD1-1B21D9CC8C6F} 10602.391 6644703 246 Woodbridge 6 811 9 0 0.03327565 0.00006410359 9 0 11.10 11.10
NA 74 204 100 Stratford Gardens 4646798 9067.142 204 100 Stratford Gardens NA NA NA NA NA {6E9F3AC7-B56F-4833-A641-F8E140A82013} 9067.142 4646798 201 Stratford Gardens 9 613 7 0 0.02723160 0.00004490581 7 0 11.42 11.42
NA 47 47 99 Country Club District 6297063 13005.396 47 99 Country Club District NA NA NA NA NA {A16D5D87-CF9F-4B5A-BE5D-E748709F6FB8} 13005.396 6297063 42 Country Club District 35 706 9 0 0.03797916 0.00006085462 9 0 12.75 12.75


From eyeballing the first six rows of the data above, it appears that the variables with the same name generally have the same data. But there is one obvious exception. OBJECTID...# variables (and OBJECTID1 variable) do not always match. This is most obvious with OBJECTID...2 and OBJECTID...3 which are located beside each other in the data. While OBJECTID...8 appears to match OBJECTID...3, OBJECTID1 does not appear to match the other OBJECTID... columns. The different NBHNAME...# and NBHID...# items do appear to match.

You may also notice multiple sets of geographic variables–SHAPE_AREA and SHAPE_LEN, SHAPE_Area and SHAPE_Length, and Shape_Area and Shape_Length. While the first two sets appear to match, the third set appears to be completely different. We can ultimately cross-reference this with the data pulled directly from DataKC. For now, I want to more formally test that these various “duplicate” items are indeed the same (or different).

Show code
dupevar_test <- function(df, var1, var2) {
  
  dupetest <- (df[[var1]] == df[[var2]])
  
  tabyl(dupetest) %>%
    adorn_title(paste0(var1, " = ", var2), placement = "combined")
}

First, we can take a look at the OBJECTID...# items.

Show code
#OBJECTID Variables:
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID...3") %>% gt()
OBJECTID...2 = OBJECTID...3/ n percent
FALSE 234 0.98734177
TRUE 3 0.01265823
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID...8") %>% gt()
OBJECTID...2 = OBJECTID...8/ n percent
FALSE 234 0.98734177
TRUE 3 0.01265823
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...3", var2 = "OBJECTID...8") %>% gt()
OBJECTID...3 = OBJECTID...8/ n percent
TRUE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID1") %>% gt()
OBJECTID...2 = OBJECTID1/ n percent
FALSE 235 0.991561181
TRUE 2 0.008438819
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...3", var2 = "OBJECTID1") %>% gt()
OBJECTID...3 = OBJECTID1/ n percent
FALSE 237 1

As you can see above, OBJECTID...3 and OBJECTID...8 are identical to each other but almost completely different from OBJECTID...2 (except for 3 values). OBJECTID...2 is also mostly different from OBJECTID1 (except for 2 values) and completely different from OBJECTID...3 and OBJECTID...8. I will need to keep three different versions of the OBJECTID variables for now and compare it to the DataKC data to confirm which aligns with the public data.

Next we can look at the NBHID and NBHNAME variables.

Show code
#NBHID Variables:
dupevar_test(df = kcmo_crimerate_raw, var1 = "NBHID...4", var2 = "NBHID...9") %>% gt()
NBHID...4 = NBHID...9/ n percent
TRUE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "NBHNAME...5", var2 = "NBHNAME...10") %>% gt()
NBHNAME...5 = NBHNAME...10/ n percent
FALSE 1 0.004219409
TRUE 236 0.995780591

These items are exact matches except for one observation between the two NBHNAME items.

Show code
kcmo_crimerate_raw %>%
  select(NBHNAME...5, NBHNAME...10) %>%
  filter(NBHNAME...5 != NBHNAME...10) %>%
  print()
# A tibble: 1 × 2
  NBHNAME...5             NBHNAME...10           
  <chr>                   <chr>                  
1 Noble And Gregory Ridge Noble and Gregory Ridge

As you can see above, the one difference is the result of “And” being capitalized in NBHNAME...5 and not in NBHNAME...10. This gives us confidence that the NBHID variable is capturing distinct neighborhoods. We will cross-reference is with the data from DataKC to settle on which version we will use.

Finally, we can look at the shape variables.

Show code
#Shape Variables
#Area
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_AREA", var2 = "SHAPE_Area") %>% gt()
SHAPE_AREA = SHAPE_Area/ n percent
FALSE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_AREA", var2 = "Shape_Area") %>% gt()
SHAPE_AREA = Shape_Area/ n percent
FALSE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_Area", var2 = "Shape_Area") %>% gt()
SHAPE_Area = Shape_Area/ n percent
FALSE 237 1
Show code
#Length
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_LEN", var2 = "SHAPE_Length") %>% gt()
SHAPE_LEN = SHAPE_Length/ n percent
FALSE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_LEN", var2 = "Shape_Length") %>% gt()
SHAPE_LEN = Shape_Length/ n percent
FALSE 237 1
Show code
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_Length", var2 = "Shape_Length") %>% gt()
SHAPE_Length = Shape_Length/ n percent
FALSE 237 1

As you can see in the results of the duplication test, all of the shape variables are different. This is confusing as, for example, when eyeballing the SHAPE_AREA and SHAPE_Area variables they look the same. Apparently, this is a result of the values (floating point numbers) being rounded to different decimal places. While the decimals don’t show up in the printed data frame output, they’re there beneath the surface. You can see this when we convert the data to be character values and print the first six rows of the data frame.

Show code
kcmo_crimerate_raw %>%
  select(NBHID...4, SHAPE_AREA, SHAPE_Area, Shape_Area,
         SHAPE_LEN, SHAPE_Length, Shape_Length
         ) %>%
  arrange(NBHID...4) %>%
  mutate(across(everything(), as.character)) %>%
  head() %>%
  gt()
NBHID...4 SHAPE_AREA SHAPE_Area Shape_Area SHAPE_LEN SHAPE_Length Shape_Length
1 11127114.0169 11127114.0169353 0.000107679964378077 14648.8655362 14648.8655362293 0.04563783990266
2 9742312.98781 9742312.98781225 0.0000942717625207003 13172.9838493 13172.983849271 0.0410452407100574
3 5395420.65936 5395420.65936544 0.0000522023426394401 10200.1884386 10200.188438621 0.0303893608959742
4 17274149.2768 17274149.2768049 0.000167131690021752 17190.5486385 17190.5486385225 0.0550187826643754
5 8941995.18645 8941995.18645213 0.0000865161932750384 11695.4943336 11695.4943335761 0.0354745451094779
6 13099376.9126 13099376.9125981 0.000126717851680561 15696.6951573 15696.6951573032 0.046690091311547


We’re not too concerned about this and, if we had to choose one, we’d choose the more precise number. But what all this does is reinforce the necessity of using the DataKC geography data directly. In fact, we can check the DataKC geography data to see at what level of precision it’s geography variables are stored.

Show code
kcmo_geo %>%
  select(-the_geom) %>%
  filter(NBHID != 0) %>%
  arrange(NBHID) %>%
  mutate(across(everything(), as.character)) %>%
  head() %>%
  gt() 
NBHNAME OBJECTID NBHID SHAPE_AREA SHAPE_LEN
Columbus Park Industrial 44 1 11127114.0169 14648.8655362
River Market 170 2 9742312.98781 13172.9838493
Quality Hill 162 3 5395420.65936 10200.1884386
CBD Downtown 32 4 17274149.2768 17190.5486385
Paseo West 156 5 8941995.18645 11695.4943336
Hospital Hill 94 6 13099376.9126 15696.6951573

It’s pretty clear that the all caps version of the SHAPE_AREA and SHAPE_LEN variables in our crime rate data are very likely the same as those in the data from the city. All are stored as 13 characters (12 numbers plus the decimal place). We will confirm this by merging the city data with the crime rate data. But first, we will want to trim the crime rate data to just the unique variables and perhaps rename some of them so they do not duplicate the variable names in the DataKC data.

Show code
kcmo_crimerate_geo <- kcmo_crimerate_raw %>%
  select(Selected, OBJECTID...2, OBJECTID...3, OBJECTID1, NBHID...4, 
         SHAPE_AREA, SHAPE_LEN, NBHNAME...5,
         RATE, COLOR, X, Y, LASTUPDATE, GLOBALID, SHAPE_Length, SHAPE_Area,
         OBJECTID1, BlockGroup, FREQUENCY, SUM_POP20, Property, Violent,
         Shape_Length, Shape_Area, TotalCrime, VC_1000_Rate, PC_1000_Rate, Crime_1000_Rate) %>%
  rename(OBJECTID_A = OBJECTID...2,
         OBJECTID_raw = OBJECTID...3, 
         NBHID_raw = NBHID...4,
         SHAPE_AREA_raw = SHAPE_AREA,
         SHAPE_LEN_raw = SHAPE_LEN,
         NBHNAME_raw = NBHNAME...5) %>%
  full_join(kcmo_geo, by = c("NBHID_raw" = "NBHID"), keep = TRUE)

One thing to note about this merged data is that it has 3 more observations than the original data. This difference is explained in part by the 6 observations with 0 for a NHBID and NA for the NBHNAME variable in the city’s data and 3 observations that are missing from the crime rate data. You can see this clearly in the simple table below. We believe these were the 3 neighborhoods that were intentionally selected out from the sampling frame for methodological reasons.

Show code
library(gt)
kcmo_crimerate_geo %>%
  filter(is.na(FREQUENCY)) %>%
  arrange(NBHID) %>%
  select(NBHID, NBHNAME, TotalCrime) %>%
  gt()
NBHID NBHNAME TotalCrime
0 NA NA
0 NA NA
0 NA NA
0 NA NA
0 NA NA
0 NA NA
130 Marlborough East NA
155 Sechrest NA
187 Harlem NA

Now we should be able to use our dupevar_test() function to confirm which variables in the crime rate data are the same as the city’s data. Of course, now that we have merged it with the city data, this is all kind of moot.2

Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "OBJECTID", var2 = "OBJECTID_A") %>% gt()
OBJECTID = OBJECTID_A/ n percent valid_percent
FALSE 234 0.95121951 0.98734177
TRUE 3 0.01219512 0.01265823
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "OBJECTID", var2 = "OBJECTID_raw") %>% gt()
OBJECTID = OBJECTID_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHID", var2 = "NBHID_raw") %>% gt()
NBHID = NBHID_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "SHAPE_AREA_raw") %>% gt()
SHAPE_AREA = SHAPE_AREA_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "SHAPE_Area") %>% gt()
SHAPE_AREA = SHAPE_Area/ n percent valid_percent
FALSE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "Shape_Area") %>% gt()
SHAPE_AREA = Shape_Area/ n percent valid_percent
FALSE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "SHAPE_LEN_raw") %>% gt()
SHAPE_LEN = SHAPE_LEN_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "SHAPE_Length") %>% gt()
SHAPE_LEN = SHAPE_Length/ n percent valid_percent
FALSE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "Shape_Length") %>% gt()
SHAPE_LEN = Shape_Length/ n percent valid_percent
FALSE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHNAME", var2 = "NBHNAME_raw") %>% gt()
NBHNAME = NBHNAME_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA
Show code
dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHNAME", var2 = "NBHNAME_raw") %>% gt()
NBHNAME = NBHNAME_raw/ n percent valid_percent
TRUE 237 0.96341463 1
NA 9 0.03658537 NA

Now, we can trim the crime rate data to just those variables we need to identify neighborhood boundaries, map crime, and ultimately merge with our survey data. I will also create a factor variable that reflects the specific sample (Not Sampled vs. Random Sample vs. High Crime Sample) from which the neighborhood was drawn (or not).

Show code
kcmo_crimerate <- kcmo_crimerate_geo %>%
  select(Selected, NBHID, NBHNAME, OBJECTID, SHAPE_AREA, SHAPE_LEN, the_geom,
         FREQUENCY:Crime_1000_Rate, LASTUPDATE) %>%
  mutate(sample = ifelse(is.na(Selected), 0, 1),
         sample = ifelse(NBHID %in% c(11, 59), 1, sample),
         sample = ifelse(Selected == 1 & NBHID %in% c(122, 134, 54, 51, 81, 66, 55, 56, 37, 5), 2, sample)
  ) %>%
  mutate(sample_fact = factor(sample,
                              labels = c("Not Sampled", "Random Sample", "High Crime Sample"))
  ) %>%
  select(-Selected) 
Show code
saveRDS(kcmo_crimerate, here("Data", "kcmo_crimerate.rds"))

11.0.4 Prepare the Geography & Crime Rate Data for Mapping

The Kansas City neighborhood data from dataKC that we joined/merged with the crime rate data included the relevant “features” or geographic information we needed to plot maps of the data. Specifically, the_geom variable in that data includes the relevant geographic information to draw our maps. In drawing the maps, we will work with the “sf” package which is one of the go-to packages for working with geographic shape/feature data (the sf stands for “simple features”). The “sf” package has a website with multiple vignettes for working with geographic data.

The sf package includes multiple operation for working with geographic features. In fact, we could have used it to directly read the shape (.shp) files that Dr. Kotlaja provided us. However, given our more simple goals–to map neighborhood variation in collective efficacy and crime–the geographic features of the neighborhoods is all we really need. So we will use the st_as_sf() function form the sf package to point it to the geographic information already included in our merged data.

Show code
crimerate_geom <- st_as_sfc(kcmo_crimerate$the_geom) 

# Create the sf data frame
kcmo_crimerate_sf <- st_sf(kcmo_crimerate, geometry = crimerate_geom, crs = 4326)

names(kcmo_crimerate_sf)
 [1] "NBHID"           "NBHNAME"         "OBJECTID"        "SHAPE_AREA"     
 [5] "SHAPE_LEN"       "the_geom"        "FREQUENCY"       "SUM_POP20"      
 [9] "Property"        "Violent"         "Shape_Length"    "Shape_Area"     
[13] "TotalCrime"      "VC_1000_Rate"    "PC_1000_Rate"    "Crime_1000_Rate"
[17] "LASTUPDATE"      "sample"          "sample_fact"     "geometry"       
Show code
saveRDS(kcmo_crimerate_sf, here("Data", "kcmo_crimerate_sf.rds"))

11.0.4.1 Identify Neighborhoods in Survey Data

The survey data does not have a corresponding NBHID variable that directly maps onto the same variable in the geo-coded data files we were just working with. Of course, given we know which neighborhoods were included in our sample, and there is a neighborhood number identifier (coded as NBHD) we can link the survey data to the geographic data with a little bit of work.

The first thing we’ll do is identify what neighborhoods are included in our recoded survey data

Show code
kc_combsurv_recode %>%
  tabyl(NBHD) %>%
  zap_label() %>%
  gt() %>%
    tab_options(
      container.height = px(500),
      container.overflow.y = TRUE)
NBHD n percent
1 9 0.023376623
2 11 0.028571429
3 8 0.020779221
5 6 0.015584416
6 7 0.018181818
7 7 0.018181818
8 7 0.018181818
9 2 0.005194805
10 3 0.007792208
11 14 0.036363636
12 23 0.059740260
13 9 0.023376623
14 25 0.064935065
15 5 0.012987013
16 13 0.033766234
17 10 0.025974026
18 7 0.018181818
19 8 0.020779221
21 21 0.054545455
22 2 0.005194805
23 3 0.007792208
24 1 0.002597403
25 7 0.018181818
26 17 0.044155844
27 10 0.025974026
28 11 0.028571429
29 14 0.036363636
30 2 0.005194805
31 5 0.012987013
33 18 0.046753247
34 20 0.051948052
35 10 0.025974026
36 10 0.025974026
37 3 0.007792208
38 10 0.025974026
39 8 0.020779221
40 4 0.010389610
41 13 0.033766234
42 9 0.023376623
43 13 0.033766234


As you can see in the above table, the NBHD variable is simply the neighborhoods sequentially numbered. Dr. Kotlaja shared the corresponding NBHID values from the crime rate data. We can simply create that variable within our survey data and merge it with the crime rate data.

Show code
kc_combsurv_geoid <- kc_combsurv_recode %>%
  mutate(NBHID = case_match(
    NBHD,
    1 ~ 217, 
    2 ~ 235, 
    3 ~ 240, 
    6 ~ 219, 
    7 ~ 221, 
    8 ~ 205, 
    11 ~ 17, 
    12 ~ 12, 
    13 ~ 62, 
    14 ~ 31, 
    15 ~ 60, 
    16 ~ 76, 
    17 ~ 80, 
    18 ~ 2, 
    19 ~ 192, 
    21 ~ 171, 
    24 ~ 178, 
    25 ~ 169, 
    26 ~ 106, 
    27 ~ 142, 
    28 ~ 110, 
    29 ~ 95,
    30 ~ 186, 
    31 ~ 5,
    33 ~ 56,
    34 ~ 55,
    35 ~ 66,
    36 ~ 81,
    38 ~ 54,
    39 ~ 134, 
    41 ~ 11,
    42 ~ 59,
    4 ~ 236,
    5 ~ 232,
    9 ~ 210,
    20 ~ 79,
    22 ~ 128, 
    23 ~ 131, 
    10 ~ 207, 
    32 ~ 37,
    37 ~ 51,
    40 ~ 122)
    ) %>%
  dplyr::distinct(NBHD, NBHID) %>%
  full_join(kcmo_crimerate_sf)

saveRDS(kc_combsurv_geoid, here("Data", "kc_combsurv_geoid.rds"))

  1. When R encounters this, it simply appends “…#” where the # represents the column number where the variable is located in the data.↩︎

  2. We also already made some decisions in this regard. For example, recall that the two NBHDNAME...# variables in the crime rate data had one value that was different - capital “And” vs. lowercase “and”. We simply went to that value in the city’s data to determine which was correct.↩︎