11 Appendix II - Cleaning & Wrangling the Neighborhood Data

Show code

# Load Packages
library(tidyverse)
library(easystats)
library(gt)
library(sf)
library(haven)
library(readxl)
library(here)
library(janitor)
library(viridis)
library(plotly)
library(mapview)
library(leaflet)
library(htmlwidgets)
library(biscale)
library(base64enc)
library(htmltools)

options(scipen = 999)

11.0.1 Introduction

We are going to use this appendix to document our process for cleaning and wrangling the neighborhood-level data constructed from the the individual-level community survey data (as well as official crime data). These data were used in “Chapter 6: Mapping Collective Efficacy” in order to examine the neighborhood-level variation in collective efficacy and crime.

11.0.2 Load the Data

First, we need to load the crime rate data that was in the “Data” folder of the “ArcGIS Pro Project” folder shared with us via dropbox. We will specifically use the CrimeRate Selected Neighborhood.xls data as it includes an indicator variable for the neighborhoods that were sampled. It includes SHAPE_AREA and SHAPE_LEN variables that indicate geographic shape/location features, but not the full multipolygon which may be necessary to construct a map (we’ll find out below). So we downloaded that data (in September of 2024 from DataKC. On first glance, it appeared to have the same neighborhood information as included in the CrimeRate data, but we can check that to make sure. And, of course, we’ll also load the full recoded and analytical data “Chapter 5: Measuring Collective Efficacy” that includes our collective efficacy and crime-related items, subscales, and scales.

Show code

kcmo_crimerate_raw <- read_xls(here("Data", "GeoData", "CrimeRate Selected Neighborhoods.xls"))
kcmo_geo <- read_csv(here("Data", "GeoData", "Kansas_City_Neighborhood_Borders_20240918.csv"))
kcmo_district <- read_xls(here("Data", "GeoData", "Neighborhood_CouncilDistricts_Jon & Jake.xls"))
kc_combsurv_ceanal <- readRDS(here("Data", "kc_combsurv_ceanal.rds"))
kc_combsurv_recode <- readRDS(here("Data", "kc_combsurv_recode.rds"))

11.0.3 Clean the Geography/Crime Rate Data

When loading the crime rate data, a couple things jumped out. First, there are multiple variables with the same names.¹

Show code

names(kcmo_crimerate_raw)

 [1] "Selected"        "OBJECTID...2"    "OBJECTID...3"    "NBHID...4"      
 [5] "NBHNAME...5"     "SHAPE_AREA"      "SHAPE_LEN"       "OBJECTID...8"   
 [9] "NBHID...9"       "NBHNAME...10"    "RATE"            "COLOR"          
[13] "X"               "Y"               "LASTUPDATE"      "GLOBALID"       
[17] "SHAPE_Length"    "SHAPE_Area"      "OBJECTID1"       "BlockGroup"     
[21] "FREQUENCY"       "SUM_POP20"       "Property"        "Violent"        
[25] "Shape_Length"    "Shape_Area"      "TotalCrime"      "VC_1000_Rate"   
[29] "PC_1000_Rate"    "Crime_1000_Rate"

As you can see above, the kcmo_crimerate_raw data has 30 variables, many of which have the same name. The most common case of this is with three “OBJECTID…#” items (there is also an “OBJECTID1” variable) and what looks like two “NBHID…#” and “NBHNAME…#” variables. Fixing the names is generally an easy fix (we can simply remove the duplicate columns). But before we do that, we want to make sure columns with the same name do indeed have the same data.

Show code

head(kcmo_crimerate_raw) %>%
  gt() %>%
  tab_options(
  container.height = px(500),
  container.overflow.y = TRUE)

Selected	OBJECTID...2	OBJECTID...3	NBHID...4	NBHNAME...5	SHAPE_AREA	SHAPE_LEN	OBJECTID...8	NBHID...9	NBHNAME...10	RATE	COLOR	X	Y	LASTUPDATE	GLOBALID	SHAPE_Length	SHAPE_Area	OBJECTID1	BlockGroup	FREQUENCY	SUM_POP20	Property	Shape_Length	Shape_Area	TotalCrime	PC_1000_Rate	Crime_1000_Rate
NA	21	79	180	Glen Lake	1157349	4401.839	79	180	Glen Lake	NA	NA	NA	NA	NA	{14443097-6C3A-4850-9411-C008FBE487E3}	4401.839	1157349	75	Glen Lake	3	183	0	0.01402462	0.00001118447	0	0.00	0.00
NA	230	219	150	Verona Hills	20078778	22570.071	219	150	Verona Hills	NA	NA	NA	NA	NA	{A0E56E45-B652-4D40-ABA4-44994EF237DC}	22570.071	20078778	221	Verona Hills	42	2088	15	0.06955936	0.00019376292	15	7.18	7.18
NA	60	19	185	Blue Ridge Farms	39507133	29419.832	19	185	Blue Ridge Farms	NA	NA	NA	NA	NA	{26AE24F9-2EDE-421A-9426-851FD50D9BB9}	29419.832	39507133	14	Blue Ridge Farms	10	436	4	0.09173668	0.00038109110	4	9.17	9.17
NA	141	244	153	Woodbridge	6644703	10602.391	244	153	Woodbridge	NA	NA	NA	NA	NA	{22A79545-9E48-4876-9AD1-1B21D9CC8C6F}	10602.391	6644703	246	Woodbridge	6	811	9	0.03327565	0.00006410359	9	11.10	11.10
NA	74	204	100	Stratford Gardens	4646798	9067.142	204	100	Stratford Gardens	NA	NA	NA	NA	NA	{6E9F3AC7-B56F-4833-A641-F8E140A82013}	9067.142	4646798	201	Stratford Gardens	9	613	7	0.02723160	0.00004490581	7	11.42	11.42
NA	47	47	99	Country Club District	6297063	13005.396	47	99	Country Club District	NA	NA	NA	NA	NA	{A16D5D87-CF9F-4B5A-BE5D-E748709F6FB8}	13005.396	6297063	42	Country Club District	35	706	9	0.03797916	0.00006085462	9	12.75	12.75

From eyeballing the first six rows of the data above, it appears that the variables with the same name generally have the same data. But there is one obvious exception. OBJECTID...# variables (and OBJECTID1 variable) do not always match. This is most obvious with OBJECTID...2 and OBJECTID...3 which are located beside each other in the data. While OBJECTID...8 appears to match OBJECTID...3, OBJECTID1 does not appear to match the other OBJECTID... columns. The different NBHNAME...# and NBHID...# items do appear to match.

You may also notice multiple sets of geographic variables–SHAPE_AREA and SHAPE_LEN, SHAPE_Area and SHAPE_Length, and Shape_Area and Shape_Length. While the first two sets appear to match, the third set appears to be completely different. We can ultimately cross-reference this with the data pulled directly from DataKC. For now, I want to more formally test that these various “duplicate” items are indeed the same (or different).

Show code

dupevar_test <- function(df, var1, var2) {
  
  dupetest <- (df[[var1]] == df[[var2]])
  
  tabyl(dupetest) %>%
    adorn_title(paste0(var1, " = ", var2), placement = "combined")
}

First, we can take a look at the OBJECTID...# items.

Show code

#OBJECTID Variables:
dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID...3") %>% gt()

OBJECTID...2 = OBJECTID...3/	n	percent
FALSE	234	0.98734177
TRUE	3	0.01265823

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID...8") %>% gt()

OBJECTID...2 = OBJECTID...8/	n	percent
FALSE	234	0.98734177
TRUE	3	0.01265823

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...3", var2 = "OBJECTID...8") %>% gt()

OBJECTID...3 = OBJECTID...8/	n	percent
TRUE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...2", var2 = "OBJECTID1") %>% gt()

OBJECTID...2 = OBJECTID1/	n	percent
FALSE	235	0.991561181
TRUE	2	0.008438819

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "OBJECTID...3", var2 = "OBJECTID1") %>% gt()

OBJECTID...3 = OBJECTID1/	n	percent
FALSE	237	1

As you can see above, OBJECTID...3 and OBJECTID...8 are identical to each other but almost completely different from OBJECTID...2 (except for 3 values). OBJECTID...2 is also mostly different from OBJECTID1 (except for 2 values) and completely different from OBJECTID...3 and OBJECTID...8. I will need to keep three different versions of the OBJECTID variables for now and compare it to the DataKC data to confirm which aligns with the public data.

Next we can look at the NBHID and NBHNAME variables.

Show code

#NBHID Variables:
dupevar_test(df = kcmo_crimerate_raw, var1 = "NBHID...4", var2 = "NBHID...9") %>% gt()

NBHID...4 = NBHID...9/	n	percent
TRUE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "NBHNAME...5", var2 = "NBHNAME...10") %>% gt()

NBHNAME...5 = NBHNAME...10/	n	percent
FALSE	1	0.004219409
TRUE	236	0.995780591

These items are exact matches except for one observation between the two NBHNAME items.

Show code

kcmo_crimerate_raw %>%
  select(NBHNAME...5, NBHNAME...10) %>%
  filter(NBHNAME...5 != NBHNAME...10) %>%
  print()

# A tibble: 1 × 2
  NBHNAME...5             NBHNAME...10           
  <chr>                   <chr>                  
1 Noble And Gregory Ridge Noble and Gregory Ridge

As you can see above, the one difference is the result of “And” being capitalized in NBHNAME...5 and not in NBHNAME...10. This gives us confidence that the NBHID variable is capturing distinct neighborhoods. We will cross-reference is with the data from DataKC to settle on which version we will use.

Finally, we can look at the shape variables.

Show code

#Shape Variables
#Area
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_AREA", var2 = "SHAPE_Area") %>% gt()

SHAPE_AREA = SHAPE_Area/	n	percent
FALSE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_AREA", var2 = "Shape_Area") %>% gt()

SHAPE_AREA = Shape_Area/	n	percent
FALSE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_Area", var2 = "Shape_Area") %>% gt()

SHAPE_Area = Shape_Area/	n	percent
FALSE	237	1

Show code

#Length
dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_LEN", var2 = "SHAPE_Length") %>% gt()

SHAPE_LEN = SHAPE_Length/	n	percent
FALSE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_LEN", var2 = "Shape_Length") %>% gt()

SHAPE_LEN = Shape_Length/	n	percent
FALSE	237	1

Show code

dupevar_test(df = kcmo_crimerate_raw, var1 = "SHAPE_Length", var2 = "Shape_Length") %>% gt()

SHAPE_Length = Shape_Length/	n	percent
FALSE	237	1

As you can see in the results of the duplication test, all of the shape variables are different. This is confusing as, for example, when eyeballing the SHAPE_AREA and SHAPE_Area variables they look the same. Apparently, this is a result of the values (floating point numbers) being rounded to different decimal places. While the decimals don’t show up in the printed data frame output, they’re there beneath the surface. You can see this when we convert the data to be character values and print the first six rows of the data frame.

Show code

kcmo_crimerate_raw %>%
  select(NBHID...4, SHAPE_AREA, SHAPE_Area, Shape_Area,
         SHAPE_LEN, SHAPE_Length, Shape_Length
         ) %>%
  arrange(NBHID...4) %>%
  mutate(across(everything(), as.character)) %>%
  head() %>%
  gt()

NBHID...4	SHAPE_AREA	SHAPE_Area	Shape_Area	SHAPE_LEN	SHAPE_Length	Shape_Length
1	11127114.0169	11127114.0169353	0.000107679964378077	14648.8655362	14648.8655362293	0.04563783990266
2	9742312.98781	9742312.98781225	0.0000942717625207003	13172.9838493	13172.983849271	0.0410452407100574
3	5395420.65936	5395420.65936544	0.0000522023426394401	10200.1884386	10200.188438621	0.0303893608959742
4	17274149.2768	17274149.2768049	0.000167131690021752	17190.5486385	17190.5486385225	0.0550187826643754
5	8941995.18645	8941995.18645213	0.0000865161932750384	11695.4943336	11695.4943335761	0.0354745451094779
6	13099376.9126	13099376.9125981	0.000126717851680561	15696.6951573	15696.6951573032	0.046690091311547

We’re not too concerned about this and, if we had to choose one, we’d choose the more precise number. But what all this does is reinforce the necessity of using the DataKC geography data directly. In fact, we can check the DataKC geography data to see at what level of precision it’s geography variables are stored.

Show code

kcmo_geo %>%
  select(-the_geom) %>%
  filter(NBHID != 0) %>%
  arrange(NBHID) %>%
  mutate(across(everything(), as.character)) %>%
  head() %>%
  gt()

NBHNAME	OBJECTID	NBHID	SHAPE_AREA	SHAPE_LEN
Columbus Park Industrial	44	1	11127114.0169	14648.8655362
River Market	170	2	9742312.98781	13172.9838493
Quality Hill	162	3	5395420.65936	10200.1884386
CBD Downtown	32	4	17274149.2768	17190.5486385
Paseo West	156	5	8941995.18645	11695.4943336
Hospital Hill	94	6	13099376.9126	15696.6951573

It’s pretty clear that the all caps version of the SHAPE_AREA and SHAPE_LEN variables in our crime rate data are very likely the same as those in the data from the city. All are stored as 13 characters (12 numbers plus the decimal place). We will confirm this by merging the city data with the crime rate data. But first, we will want to trim the crime rate data to just the unique variables and perhaps rename some of them so they do not duplicate the variable names in the DataKC data.

Show code

kcmo_crimerate_geo <- kcmo_crimerate_raw %>%
  select(Selected, OBJECTID...2, OBJECTID...3, OBJECTID1, NBHID...4, 
         SHAPE_AREA, SHAPE_LEN, NBHNAME...5,
         RATE, COLOR, X, Y, LASTUPDATE, GLOBALID, SHAPE_Length, SHAPE_Area,
         OBJECTID1, BlockGroup, FREQUENCY, SUM_POP20, Property, Violent,
         Shape_Length, Shape_Area, TotalCrime, VC_1000_Rate, PC_1000_Rate, Crime_1000_Rate) %>%
  rename(OBJECTID_A = OBJECTID...2,
         OBJECTID_raw = OBJECTID...3, 
         NBHID_raw = NBHID...4,
         SHAPE_AREA_raw = SHAPE_AREA,
         SHAPE_LEN_raw = SHAPE_LEN,
         NBHNAME_raw = NBHNAME...5) %>%
  full_join(kcmo_geo, by = c("NBHID_raw" = "NBHID"), keep = TRUE)

One thing to note about this merged data is that it has 3 more observations than the original data. This difference is explained in part by the 6 observations with 0 for a NHBID and NA for the NBHNAME variable in the city’s data and 3 observations that are missing from the crime rate data. You can see this clearly in the simple table below. We believe these were the 3 neighborhoods that were intentionally selected out from the sampling frame for methodological reasons.

Show code

library(gt)
kcmo_crimerate_geo %>%
  filter(is.na(FREQUENCY)) %>%
  arrange(NBHID) %>%
  select(NBHID, NBHNAME, TotalCrime) %>%
  gt()

NBHID	NBHNAME	TotalCrime
0	NA	NA
0	NA	NA
0	NA	NA
0	NA	NA
0	NA	NA
0	NA	NA
130	Marlborough East	NA
155	Sechrest	NA
187	Harlem	NA

Now we should be able to use our dupevar_test() function to confirm which variables in the crime rate data are the same as the city’s data. Of course, now that we have merged it with the city data, this is all kind of moot.²

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "OBJECTID", var2 = "OBJECTID_A") %>% gt()

OBJECTID = OBJECTID_A/	n	percent	valid_percent
FALSE	234	0.95121951	0.98734177
TRUE	3	0.01219512	0.01265823
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "OBJECTID", var2 = "OBJECTID_raw") %>% gt()

OBJECTID = OBJECTID_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHID", var2 = "NBHID_raw") %>% gt()

NBHID = NBHID_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "SHAPE_AREA_raw") %>% gt()

SHAPE_AREA = SHAPE_AREA_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "SHAPE_Area") %>% gt()

SHAPE_AREA = SHAPE_Area/	n	percent	valid_percent
FALSE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_AREA", var2 = "Shape_Area") %>% gt()

SHAPE_AREA = Shape_Area/	n	percent	valid_percent
FALSE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "SHAPE_LEN_raw") %>% gt()

SHAPE_LEN = SHAPE_LEN_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "SHAPE_Length") %>% gt()

SHAPE_LEN = SHAPE_Length/	n	percent	valid_percent
FALSE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "SHAPE_LEN", var2 = "Shape_Length") %>% gt()

SHAPE_LEN = Shape_Length/	n	percent	valid_percent
FALSE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHNAME", var2 = "NBHNAME_raw") %>% gt()

NBHNAME = NBHNAME_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Show code

dupevar_test(df = kcmo_crimerate_geo, var1 = "NBHNAME", var2 = "NBHNAME_raw") %>% gt()

NBHNAME = NBHNAME_raw/	n	percent	valid_percent
TRUE	237	0.96341463	1
NA	9	0.03658537	NA

Now, we can trim the crime rate data to just those variables we need to identify neighborhood boundaries, map crime, and ultimately merge with our survey data. I will also create a factor variable that reflects the specific sample (Not Sampled vs. Random Sample vs. High Crime Sample) from which the neighborhood was drawn (or not).

Show code

kcmo_crimerate <- kcmo_crimerate_geo %>%
  select(Selected, NBHID, NBHNAME, OBJECTID, SHAPE_AREA, SHAPE_LEN, the_geom,
         FREQUENCY:Crime_1000_Rate, LASTUPDATE) %>%
  mutate(sample = ifelse(is.na(Selected), 0, 1),
         sample = ifelse(NBHID %in% c(11, 59), 1, sample),
         sample = ifelse(Selected == 1 & NBHID %in% c(122, 134, 54, 51, 81, 66, 55, 56, 37, 5), 2, sample)
  ) %>%
  mutate(sample_fact = factor(sample,
                              labels = c("Not Sampled", "Random Sample", "High Crime Sample"))
  ) %>%
  select(-Selected)

Show code

saveRDS(kcmo_crimerate, here("Data", "kcmo_crimerate.rds"))

11.0.4 Prepare the Geography & Crime Rate Data for Mapping

The Kansas City neighborhood data from dataKC that we joined/merged with the crime rate data included the relevant “features” or geographic information we needed to plot maps of the data. Specifically, the_geom variable in that data includes the relevant geographic information to draw our maps. In drawing the maps, we will work with the “sf” package which is one of the go-to packages for working with geographic shape/feature data (the sf stands for “simple features”). The “sf” package has a website with multiple vignettes for working with geographic data.

The sf package includes multiple operation for working with geographic features. In fact, we could have used it to directly read the shape (.shp) files that Dr. Kotlaja provided us. However, given our more simple goals–to map neighborhood variation in collective efficacy and crime–the geographic features of the neighborhoods is all we really need. So we will use the st_as_sf() function form the sf package to point it to the geographic information already included in our merged data.

Show code

crimerate_geom <- st_as_sfc(kcmo_crimerate$the_geom) 

# Create the sf data frame
kcmo_crimerate_sf <- st_sf(kcmo_crimerate, geometry = crimerate_geom, crs = 4326)

names(kcmo_crimerate_sf)

 [1] "NBHID"           "NBHNAME"         "OBJECTID"        "SHAPE_AREA"     
 [5] "SHAPE_LEN"       "the_geom"        "FREQUENCY"       "SUM_POP20"      
 [9] "Property"        "Violent"         "Shape_Length"    "Shape_Area"     
[13] "TotalCrime"      "VC_1000_Rate"    "PC_1000_Rate"    "Crime_1000_Rate"
[17] "LASTUPDATE"      "sample"          "sample_fact"     "geometry"

Show code

saveRDS(kcmo_crimerate_sf, here("Data", "kcmo_crimerate_sf.rds"))

11.0.4.1 Identify Neighborhoods in Survey Data

The survey data does not have a corresponding NBHID variable that directly maps onto the same variable in the geo-coded data files we were just working with. Of course, given we know which neighborhoods were included in our sample, and there is a neighborhood number identifier (coded as NBHD) we can link the survey data to the geographic data with a little bit of work.

The first thing we’ll do is identify what neighborhoods are included in our recoded survey data

Show code

kc_combsurv_recode %>%
  tabyl(NBHD) %>%
  zap_label() %>%
  gt() %>%
    tab_options(
      container.height = px(500),
      container.overflow.y = TRUE)

NBHD	n	percent
1	9	0.023376623
2	11	0.028571429
3	8	0.020779221
5	6	0.015584416
6	7	0.018181818
7	7	0.018181818
8	7	0.018181818
9	2	0.005194805
10	3	0.007792208
11	14	0.036363636
12	23	0.059740260
13	9	0.023376623
14	25	0.064935065
15	5	0.012987013
16	13	0.033766234
17	10	0.025974026
18	7	0.018181818
19	8	0.020779221
21	21	0.054545455
22	2	0.005194805
23	3	0.007792208
24	1	0.002597403
25	7	0.018181818
26	17	0.044155844
27	10	0.025974026
28	11	0.028571429
29	14	0.036363636
30	2	0.005194805
31	5	0.012987013
33	18	0.046753247
34	20	0.051948052
35	10	0.025974026
36	10	0.025974026
37	3	0.007792208
38	10	0.025974026
39	8	0.020779221
40	4	0.010389610
41	13	0.033766234
42	9	0.023376623
43	13	0.033766234

As you can see in the above table, the NBHD variable is simply the neighborhoods sequentially numbered. Dr. Kotlaja shared the corresponding NBHID values from the crime rate data. We can simply create that variable within our survey data and merge it with the crime rate data.

Show code

kc_combsurv_geoid <- kc_combsurv_recode %>%
  mutate(NBHID = case_match(
    NBHD,
    1 ~ 217, 
    2 ~ 235, 
    3 ~ 240, 
    6 ~ 219, 
    7 ~ 221, 
    8 ~ 205, 
    11 ~ 17, 
    12 ~ 12, 
    13 ~ 62, 
    14 ~ 31, 
    15 ~ 60, 
    16 ~ 76, 
    17 ~ 80, 
    18 ~ 2, 
    19 ~ 192, 
    21 ~ 171, 
    24 ~ 178, 
    25 ~ 169, 
    26 ~ 106, 
    27 ~ 142, 
    28 ~ 110, 
    29 ~ 95,
    30 ~ 186, 
    31 ~ 5,
    33 ~ 56,
    34 ~ 55,
    35 ~ 66,
    36 ~ 81,
    38 ~ 54,
    39 ~ 134, 
    41 ~ 11,
    42 ~ 59,
    4 ~ 236,
    5 ~ 232,
    9 ~ 210,
    20 ~ 79,
    22 ~ 128, 
    23 ~ 131, 
    10 ~ 207, 
    32 ~ 37,
    37 ~ 51,
    40 ~ 122)
    ) %>%
  dplyr::distinct(NBHD, NBHID) %>%
  full_join(kcmo_crimerate_sf)

saveRDS(kc_combsurv_geoid, here("Data", "kc_combsurv_geoid.rds"))

When R encounters this, it simply appends “…#” where the # represents the column number where the variable is located in the data.↩︎
We also already made some decisions in this regard. For example, recall that the two NBHDNAME...# variables in the crime rate data had one value that was different - capital “And” vs. lowercase “and”. We simply went to that value in the city’s data to determine which was correct.↩︎