12  Appendix III - Research Design

12.1 Introduction

The following summarizes our understanding of KC Community Survey’s research design (per exchange from September 17, 2024):

12.2 Planned research design

The KC Community Surveyed conducted computer-assisted personal interviews (on tablets) with adults identified using stratified probability sampling of urban households in Kansas City, Missouri (KCMO). A two-stage stratified probability sampling procedure with stratified random oversampling was used to randomly select households in neighborhoods representing the general population living in each of the six KCMO administrative districts while also ensuring sufficient sampling coverage of households in neighborhoods identified as likely targets of crime-reduction initiatives.

The P.I. initially secured funding for a sample size of n=800 respondents age 18 and older. Since the results of the Census of population, households and dwellings in KC from 2020 show that there are 508,090 persons registered in KC metro, of which 400,050 are age 18 and older. Considering the specified sample size, the sample error, with confidence level of 95%, was estimated to be ±3.46% ( p≤0.05).

  1. Council districts represented the first level of stratification (total of 6 districts)
  • 5 randomly selected NH per each district (5 NH * 6 districts = 30 randomly selected NH)
  1. Neighborhood represented the second level of stratification (40 out of a total of 240 neighborhoods)
  • 30 randomly selected NH (6 per district)
  • 10 strategically selected NH to oversample “high risk” NH with 10 highest violent crime rates as of the most recent official data from KC police, as these NH are places where anti-crime interventions are most likely to be implemented
  1. In each NH, 1 adult per household (HH) from 10 HH around 2 random starting points per NH were selected for interview (1 adult * 10 HH * 2 SP per NH = 20 adults per NH; 800 adults total)

Generalizability of the sample was maximized via random selection of starting points, random selection of households within cluster, and random selection of the household member selected for interview.

To conduct the survey, for each district the necessary number of starting points (two per neighborhood) were randomly selected from the total base of starting points (neighborhoods), using the Statistical Package for the Social Sciences version 21 (SPSS 21). At any selected starting point, no more than ten (10) surveys were permitted to be completed. In addition, additional substitute random starting points were selected as well so that, during fieldwork, when a need arised for additional starting points or to alter some of them (e.g., inaccessible; non-residential; etc.), a pre-determined principled probability sampling approach would be used.

In each of the areas covered by the randomly chosen starting point, households in which surveys were conducted were selected using a pre-determined Random Walk Technique (e.g., see here), which was aided and simplified using pre-specified random walking routes displayed for interviewers using mapping software on an assigned interviewer tablet. This classical method was used due to insufficient proper lists of residents, which, given their consistent use and non-systematic biases in survey non-participation, should have enabled the researchers to meet their desired goal of sample representativeness.

Where necessary, the selected starting or interview points, interviewers were instructed to make selections using the “right hand rule”. For example, when interviewers arrived at the address noted, they would turn facing away from the starting point and move to her/his right. The first household the interviewers encountered would have represented the household where first survey should be attempted to be conducted. The interviewers were then instructed to continue to move in the same direction and on the same side of the street, and using the principle of selecting every third address, further selecting households where surveying would be conducted, all until the anticipated number of surveys for that starting point were complete.

In each of the households, the selection of respondents was performed by applying the Last Birthday Technique, which should have ensured a random selection of respondents within a household. Therefore, the household member older than 18 years of age who had most recently had a birthday, that is, the household member whose birth date is the closest to the date of contact was selected for invitation to be surveyed. If the person who had last had a birthday was absent (or for other non-contact reasons, e.g., no answer), the interviewer was instructed to arrange a different time for interviewing and/or a member of the research team would return to that household, before conducting the survey with another respondent from another household.

Due to various implementation challenges (e.g., community-engaged volunteering and training hurdles; lower than expected response rates; longer than expected survey administration time), some deviations occurred (e.g., supplemented online administration option at end in effort to increase participation rates for those lacking time for in-person interview), and the final “total” and “per-neighborhood” sample sizes were significantly smaller (e.g., approximately n=386 surveys; see report analysis sections for more details). The project P.I. (Dr. Marijana Kotlaja) would know more about specific design and implementation details or deviations.

12.3 Random Sampling Interviewer Starting Points”

Below is the code used to randomly sample starting points for each of the 40 sampled neighborhoods (provided to Dr. Kotlaja on September 24, 2024).

In this file, we start by using a population registry of addresses within 40 previously selected neighborhoods to sample 8 “starting points” (SP) each. The first 4 SP will be the addresses that interviewers will begin attempting to interview residents and from which they will systematically move to new potential addresses. The remaining 4 SP will be reserve SP to be used in the event that interview quotas (5 per SP) are not met in an initially selected SP. We later sampled 5 additional backup SPs per neighborhood (for 13 total) to be used if the first 8 selected were exhausted or otherwise unusable.

Though we share the code, to retain confidentiality, it is commented out or not evaluated and the corresponding randomly selected SP address file is not included.

After this, we sampled one additional neighborhood (NH) to replace an originally selected NH (due to lack of responses or valid residencies).

12.3.1 Load libraries & data

Show code
library(tidyverse)
library(readxl)
library(here)

# spdata <- read_excel("study_addresses.xls")
# spdata

12.3.2 Sample 8 SP per Neighborhood

Show code
set.seed(1138)

spsamp <- spdata %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_sample(n=8, replace=FALSE)
spsamp

Some of these starting points are unusable (e.g., building address demolished). In addition to four reserve SPs per NH, sampling five additional reserve SPs per NH.

12.3.3 Sample 13 SP per Neighborhood (4 original + 9 reserve)

Show code
set.seed(1138)

spsamp <- spdata %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_sample(n=13, replace=FALSE)
spsamp

12.3.4 Slice Main SP (first 4), Primary Reserve SP (next 4), and Secondary Reserve (last 5) per Neighborhood

Show code
sptop8 <- spsamp %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_head(n=8) 

spmain <- sptop8 %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_head(n=4) 
spmain

spreserve <- sptop8 %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_tail(n=4) 
spreserve

spreserve2 <- spsamp %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_tail(n=5)
spreserve2

12.3.5 Save to Excel files

Show code
# write_csv(spmain, "spmain.csv")
# write_csv(spreserve, "spreserve.csv")
# write_csv(spreserve2, "spreserve2.csv")

12.3.6 Backup Neighborhood Selection

12.3.6.1 Initial NH sample

Dr. Kotlaja’s team selected the initial sample of 40 neighborhoods. 30 NHs were selected via stratifed random sampling using a city administrative list of NH names as the sampling frame. Three NHs were eliminated from the sampling frame due to identification as industrial districts. After stratified random sampling of 30 NHs, another 10 NHs were strategically selected for inclusion as potential targets of violent crime initiatives by starting at the top of a list of NHs with the highest reported violent crime rates in 2023 and selecting the first 10 that were not already included in the stratified random sample of 30 NHs.

The following 30 NHs were selected via stratified random sampling of 5 NHs within each of 6 council districts. (Notes provided by NH association members about perceived issues with city NH names are included in parentheses.)

Council District 1

  1. Prairie Point-Wildberry
  2. New Mark
  3. Shoal Creek
  4. Gashland
  5. Outer Gashland-Nashua

Council District 2

  1. Barry Harbour
  2. The Coves
  3. Ravenwood-Somerset
  4. Maple Park
  5. Winnwood Gardens

Council District 3

  1. Lykins
  2. Longfellow
  3. Palestine East (Oak Park is in Here)
  4. Blue Valley
  5. Oak Park

Council District 4

  1. North Hyde Park
  2. Southmoreland
  3. River Market
  4. Davidson
  5. Rockhill

Council District 5

  1. Ruskin Heights
  2. East Meyer 6 (broken out into smaller segments- Neighbors United Together, East Meyer Cluster, East Meyer 6 vs. Fox Town East & West).
  3. Brown Estates
  4. Little Blue
  5. Hickman Mills South

Council District 6

  1. Tower Homes
  2. Linden Hills And Indian Heights
  3. Waldo Homes
  4. Morningside
  5. Richards Gebaur

The following 10 NHs were strategically selected based on high violent crime rates:
Paseo West Cunningham Ridge Key Coalition Ivanhoe Northeast + Southeast Dunbar (Heart of the City) Old Westport Mount Hope (Boston Heights/Mount Hope)
Ivanhoe Southeast East Swope Highlands Blenheim Square Research (Tri-Blenheim)

Excluded: Blue Valley Industrial, Northeast Industrial District, and Hospital Hill.

Community-Engaged Interview Targets: 40 surveys per expert interviewer with 6 total (240 Interviews) 10 surveys min per 30 community members = 300 minimum (560 Interviews)

12.3.6.2 NH replacement

After a couple days of interviewing, Dr. Kotlaja’s team indicated a need to replace the following two NHs due to lack of responses and/or valid residencies:

Cunningham Ridge (District ?) Rockhill (District 4)

Cunningham Ridge was a strategically sampled NH. Dr. Kotlaja’s team replaced it with Oak Park Southwest, which was the next NH on the list of high violent crime rates.

Rockhill was selected via stratified random sampling as one of the original 5 NHs in District 4. Dr. Kotlaja asked us to randomly sample a backup neighborhood from District 4 to replace it.

12.3.6.3 Read in Data

Let’s read in a datafile containing all the KC neighborhoods.

Show code
nhdata <- read_excel(here("Data", "KC_NHs.xlsx"))
nhdata
# A tibble: 239 × 3
   NBHID BlockGroup               DistrictOrdinal
   <dbl> <chr>                              <dbl>
 1     0 KC Zoo-Swope Park                      5
 2     1 Columbus Park Industrial               4
 3     2 River Market                           4
 4     3 Quality Hill                           4
 5     4 CBD Downtown                           3
 6     5 Paseo West                             5
 7     6 Hospital Hill                          4
 8     7 Crossroads                             4
 9     8 Westside North                         4
10     9 Westside South                         4
# ℹ 229 more rows

Now we need to randomly select another neighborhood from District 4 to replace Rockhill.

Show code
# filter to keep only 34 NHs in District 4
nhdist4 <- nhdata %>% filter(DistrictOrdinal == 4) 

# remove 5 previously selected NHs  
nhdist4 <- subset(nhdist4, !(BlockGroup %in% c("North Hyde Park", "Southmoreland", "River Market", 
                            "Davidson", "Rockhill")))

set.seed(1138)

# randomly select NH in District 4 to replace Rockhill 
nhdist4sub <- nhdist4 %>%
  slice_sample(n=1, replace=FALSE)
nhdist4sub
# A tibble: 1 × 3
  NBHID BlockGroup DistrictOrdinal
  <dbl> <chr>                <dbl>
1    11 Union Hill               4

The replacement NH for District 4 is Union Hill.

Now that we have randomly sampled the replacement NH, we will need to randomly sample 4 primary starting points (SPs) and 9 backup SPs for Union Hill and Oak Park Southwest (the strategic sample replacement).

12.3.6.4 Address data

Show code
addsubdata <- read_excel("Addresses_Update Union Hill and Oak Park Southwest.xlsx")
addsubdata

12.3.6.5 Sample 13 SP per Substitute NH (4 original + 9 reserve)

Show code
set.seed(1138)

spsubsamp <- addsubdata %>%
  group_by(KCMO_NeighborhoodsCensus.NBHID) %>%
  slice_sample(n=13, replace=FALSE)
spsubsamp

12.3.6.6 Save to Excel file

Show code
# write_csv(spsubsamp, "spsubsamp.csv")