The purpose of this assignment is to learn how to download and provide basic descriptions of data and specific variables analyzed in a published study. Up to this point, we have used built-in R data (e.g., R Assignment 2), provided you with the data you were working with (e.g., subsets of the NYS data in R Assignment 3), or you have downloaded the data manually from ICPSR and placed it within your reproducible file structure (e.g., R Assignment 4). The approach we used in the last R Assignment would work fine when you are using your own data and/or data that you have permission to share. However, this is not generally the case with data on ICPSR. According to ICPSR’s bylaws, you are not technically allowed to share ICPSR data in your own online repository (e.g. OSF or GitHub). For this assignment, we will show you how to download SPSS data directly within R and begin looking at the data via basic descriptive statistics.
Specifically, for this assignment, we will:
ifelse
function and
logic in R.I assume that you are now familiar with installing and loading packages in R. Thus, when you see a package being used, I expect that you know it needs to be installed and that it needs to be loaded within your own R session in order to use it.
At this point, I also assume you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2.
As with previous assignments, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste from the instructions except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
library(tidyverse)
library(here)
library(haven)
library(icpsrdata)
library(gt)
library(sjmisc)
In the last R Assignment (“Reproducible File structure”) we created a basic reproducible file structure and shared it using your computer’s operating system. Here, we are going to create most of the folders we need using R code. This is useful for our specific purposes–downloading data directly from ICPSR–because we do not have to rely on someone placing their data in the correct folder, we can simply share with them the code to create the folder in their own root directory.
Before we start creating folders and downloading data within R, we need to create a root folder, save our RMD file inside it, and close and open the assignment directly from that root folder (so the “here” package will start in the correct folder on our computer):
Create a subfolder within your “LastName_CRM495_RAssignment5” subfolder called “NYS_data.” Technically, you could do this yourself by navigating to the folder on your computer and creating a new “NYS_data” folder manually. But we can also do it in R with the following code. Again, doing it in the R environment helps ensure that anyone else (including our future selves) can easily reproduce our work with minimal effort.
# check if "NYS_data" folder exists (TRUE if it does) & create if it does not exist.
ifelse(dir.exists(here("NYS_data")), TRUE, dir.create(here("NYS_data")))
## [1] TRUE
Let us try to explain the above code to you. The “ifelse” command is
a logical function within base R. To get more details about it, type
?ifelse
into the console window. Here is the description of
that function:
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
It takes the form of the following:
ifelse(test, yes, no)
. This means, you give R a logical
test (or a logical question) that can be answered
yes or no and then
it gives you a value or performs another function based on the solution
of that test (i.e., based upon the answer to that question).
In the above code, we are asking if the “NYS_data” folder exists
within our root folder (i.e., your “LastName_CRM495_RAssignment5”
folder) with the dir.exists
function. If the answer is
yes, it simply returns the logical value I
told it to - in this case TRUE
. If the answer is
no, you instruct R to create that “NYS_data”
folder with the dir.create
function. Again, type
?dir.exists
or ?dir.create
for more
information.
If you want to have some fun, you can actually have R return a text string instead of the logical value. For example:
ifelse(dir.exists(here("NYS_data")), "You already created that folder, dummy!", dir.create(here("NYS_data")))
## [1] "You already created that folder, dummy!"
Generally, it is probably not a great idea to have R call the user (yourself in this case) a “dummy” with code you plan to eventually share publicly. Yet, it is also OK to have some fun when doing science. I (Jake) think that having a computer program call me a “dummy” is fun - perhaps you do not.
Note: there is probably a programming rationale for using the logical value rather than a string of which I am unaware.
Note: tidyverse syntax has a stricter
if_else
function.
According to the documentation, what makes it more strict is that “It
checks that true
and false
are the same type.”
I’ll be honest, I’m not sure exactly when this strictness is useful
(tidyverse says it can allow for more predictable use and is somewhat
faster). For most of what we will be using it for, either the base
ifelse
and tidyverse if_else
functions will
likely work just fine. I am going to use the more general
ifelse
function from base R.
You should now have a file structure for your “LastName_CRM495_RAssignment5” folder that looks like this:
Now that you have the basic file structure for this assignment and specifically the “NYS_data” folder, it’s time to download the first five waves of NYS data. Recall these are the waves of data that Warr (1993) used in his study on the delinquent peer influences and the age-distribution of crime.
As we mentioned previously, it is technically against ICPSR’s bylaws to share data housed on ICPSR “without the written agreement of ICPSR.” This means that if you included ICPSR data and/or documentation directly within a reproducible file structure that you shared with someone else, you would technically be violating the bylaws. Fortunately, there is a package called “icpsrdata” that allows you to download data housed on ICPSR directly from within R. This means you simply need to provide the code for downloading and wrangling the data and you are 1) not violating ICPSR’s bylaws and 2) adhearing to open and reproducible research practices. Let’s show you how to do that now.
We need to know the ICPSR numbers for the first five waves of the NYS. Go to the NYS Series page on ICPSR and make note of the ICPSR numbers for the first five waves of data.
Here is a table to remind us of the ICPSR numbers.
Warr (1993) NYS Items | |
---|---|
Wave | ICPSR |
Wave 1 | 8375 |
Wave 2 | 8424 |
Wave 3 | 8506 |
Wave 4 | 8917 |
Wave 5 | 9112 |
\(~\)
In order to use the “icpsrdata” package, you need to install it and load it into your current R environment.
library(icpsrdata)
To actually download the data, we will use the
icpsr_download
function that is a part of the
“icpsrdata” package. The core arguments of the function
are specifying the file_id
(i.e. ICPSR numbers) and
download_dir
(the file on your computer to where you want
the data files to be downloaded). Let’s just show you the code and then
explain it.
icpsr_download
function to an ifelse
function. It is the same logic as ifelse
function above
when we created the “NYS_data” folder. It first checks to see if the
“ICPSR_09112” folder exists (wave 5–the last wave we are telling R to
download) in the “NYS_data” folder. Then it returns the logical
statement “TRUE” if it the folder does exist and, if it does not exist,
runs the icpsr_download
command to download the first five
waves from ICPSR.ifelse(dir.exists(here("NYS_data", "ICPSR_09112")), TRUE,
icpsr_download(file_id=c(8375, 8424, 8506, 8917, 9112),
download_dir = here("NYS_data")))
icpsr_download
command to see and
respond to the ICPSR username/password prompts. If you have not yet
created a free account on ICPSR (you should have already for an earlier
project assignment), then you will need to do this on the ICPSR website
first. Then, after each prompt in the console, you would put your ICPSR
username instead of “your_icpsr_username” (I added that as a
placeholder) and your ICPSR password instead of
“your_icpsr_password.”You already did this with the wave 1 data that you downloaded directly from ICPSR in “R Assignment 4.” Now you just need to do it for each of the first five waves of NYS data you just downloaded by telling R where the specific data file is within your file structure. You want to make note of the specific files that were downloaded to the “NYS” folder.
Recall from earlier that the actual data are within a folder called
“DS0001” within each of the ICPSR folders. You simply want to use the
“here” package to tell the read_spss
function from the
“haven” package where to find the SPSS data for each wave of the data.
Make sure you pay close attention to which study numbers are associated
with each specific wave of data!
nys_w1 <- read_spss(here("NYS_data", "ICPSR_08375", "DS0001", "08375-0001-Data.sav"))
nys_w2 <- read_spss(here("NYS_data", "ICPSR_08424", "DS0001", "08424-0001-Data.sav"))
nys_w3 <- read_spss(here("NYS_data", "ICPSR_08506", "DS0001", "08506-0001-Data.sav"))
nys_w4 <- read_spss(here("NYS_data", "ICPSR_08917", "DS0001", "08917-0001-Data.sav"))
nys_w5 <- read_spss(here("NYS_data", "ICPSR_09112", "DS0001", "09112-0001-Data.sav"))
If you have done everything correctly up to this point, you should
have five data sets in your RStudio Environment representing each of the
waves 1 through 5 data that we downloaded with the
icpsr_download
function above (named “nys_w1,” “nys_w2,”
“nys_w3,” “nys_w4,” and “nys_w5”).
In “R Assignment 3” we reproduced Figure 1 from Warr’s (1993) article “Age, Peers, and Delinquency.” Feel free to go back to assignment 3 for a refresher on the article, including a description of the specific variables that Warr (1993) constructed and analyzed. For Figure 1, Warr (1993) plotted the age distribution for the percentage of respondents who reported having no friends who engaged in eight delinquent behaviors in the previous year. Over the next couple of R Assignments, we are going to focus on reproducing and extending Figures 2, 3, and 4 from Warr (1993):
These figures plot the age distribution of 1) Percentage of respondents reporting they average three or more nights per week socializing (i.e. “going on dates, to parties, or other social activities”); 2) Percentage of respondents reporting it was “Very important” or “Pretty Important” to socialize; and 3) Percentage of respondents who reported they “would lie to protect their friends if they got into trouble with the police.”
In order to reproduce these figures we need to:
In what follows, we will walk through the first four of these steps and save the last three for R Assignment 6.
In order to reproduce Figures 2, 3, and 4 from Warr (1993) we need to start by identifying the specific survey items that Warr used to construct those figures. Let’s start with his description of the items in the article. On page 24-25 he describes the survey questions that are supposed to measure these “other elements of peer relations” by listing the questions respondents were asked:
Warr (1993) provided us with the specific question wording, which is helpful and not always the case in the published literature. Of course, if you open any of the data sets you downloaded above, you won’t see the specific wording of each survey question. Here is what the first six rows of the wave 1 data look like:
head(nys_w1) %>%
gt()
CASEID | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | V21 | V22 | V23 | V24 | V25 | V26 | V27 | V28 | V29 | V30 | V31 | V32 | V33 | V34 | V35 | V36 | V37 | V38 | V39 | V40 | V41 | V42 | V43 | V44 | V45 | V46 | V47 | V48 | V49 | V50 | V51 | V52 | V53 | V54 | V55 | V56 | V57 | V58 | V59 | V60 | V61 | V62 | V63 | V64 | V65 | V66 | V67 | V68 | V69 | V70 | V71 | V72 | V73 | V74 | V75 | V76 | V77 | V78 | V79 | V80 | V81 | V82 | V83 | V84 | V85 | V86 | V87 | V88 | V89 | V90 | V91 | V92 | V93 | V94 | V95 | V96 | V97 | V98 | V99 | V100 | V101 | V102 | V103 | V104 | V105 | V106 | V107 | V108 | V109 | V110 | V111 | V112 | V113 | V114 | V115 | V116 | V117 | V118 | V119 | V120 | V121 | V122 | V123 | V124 | V125 | V126 | V127 | V128 | V129 | V130 | V131 | V132 | V133 | V134 | V135 | V136 | V137 | V138 | V139 | V140 | V141 | V142 | V143 | V144 | V145 | V146 | V147 | V148 | V149 | V150 | V151 | V152 | V153 | V154 | V155 | V156 | V157 | V158 | V159 | V160 | V161 | V162 | V163 | V164 | V165 | V166 | V167 | V168 | V169 | V170 | V171 | V172 | V173 | V174 | V175 | V176 | V177 | V178 | V179 | V180 | V181 | V182 | V183 | V184 | V185 | V186 | V187 | V188 | V189 | V190 | V191 | V192 | V193 | V194 | V195 | V196 | V197 | V198 | V199 | V200 | V201 | V202 | V203 | V204 | V205 | V206 | V207 | V208 | V209 | V210 | V211 | V212 | V213 | V214 | V215 | V216 | V217 | V218 | V219 | V220 | V221 | V222 | V223 | V224 | V225 | V226 | V227 | V228 | V229 | V230 | V231 | V232 | V233 | V234 | V235 | V236 | V237 | V238 | V239 | V240 | V241 | V242 | V243 | V244 | V245 | V246 | V247 | V248 | V249 | V250 | V251 | V252 | V253 | V254 | V255 | V256 | V257 | V258 | V259 | V260 | V261 | V262 | V263 | V264 | V265 | V266 | V267 | V268 | V269 | V270 | V271 | V272 | V273 | V274 | V275 | V276 | V277 | V278 | V279 | V280 | V281 | V282 | V283 | V284 | V285 | V286 | V287 | V288 | V289 | V290 | V291 | V292 | V293 | V294 | V295 | V296 | V297 | V298 | V299 | V300 | V301 | V302 | V303 | V304 | V305 | V306 | V307 | V308 | V309 | V310 | V311 | V312 | V313 | V314 | V315 | V316 | V317 | V318 | V319 | V320 | V321 | V322 | V323 | V324 | V325 | V326 | V327 | V328 | V329 | V330 | V331 | V332 | V333 | V334 | V335 | V336 | V337 | V338 | V339 | V340 | V341 | V342 | V343 | V344 | V345 | V346 | V347 | V348 | V349 | V350 | V351 | V352 | V353 | V354 | V355 | V356 | V357 | V358 | V359 | V360 | V361 | V362 | V363 | V364 | V365 | V366 | V367 | V368 | V369 | V370 | V371 | V372 | V373 | V374 | V375 | V376 | V377 | V378 | V379 | V380 | V381 | V382 | V383 | V384 | V385 | V386 | V387 | V388 | V389 | V390 | V391 | V392 | V393 | V394 | V395 | V396 | V397 | V398 | V399 | V400 | V401 | V402 | V403 | V404 | V405 | V406 | V407 | V408 | V409 | V410 | V411 | V412 | V413 | V414 | V415 | V416 | V417 | V418 | V419 | V420 | V421 | V422 | V423 | V424 | V425 | V426 | V427 | V428 | V429 | V430 | V431 | V432 | V433 | V434 | V435 | V436 | V437 | V438 | V439 | V440 | V441 | V442 | V443 | V444 | V445 | V446 | V447 | V448 | V449 | V450 | V451 | V452 | V453 | V454 | V455 | V456 | V457 | V458 | V459 | V460 | V461 | V462 | V463 | V464 | V465 | V466 | V467 | V468 | V469 | V470 | V471 | V472 | V473 | V474 | V475 | V476 | V477 | V478 | V479 | V480 | V481 | V482 | V483 | V484 | V485 | V486 | V487 | V488 | V489 | V490 | V491 | V492 | V493 | V494 | V495 | V496 | V497 | V498 | V499 | V500 | V501 | V502 | V503 | V504 | V505 | V506 | V507 | V508 | V509 | V510 | V511 | V512 | V513 | V514 | V515 | V516 | V517 | V518 | V519 | V520 | V521 | V522 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2 | 2 | 4 | 4 | 4 | 5 | 6 | 1 | 1 | 5 | 59 | 1 | 4 | 4 | 1 | 2 | 0 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | NA | 1 | NA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 2 | 2 | 4 | 1 | 2 | 5 | 4 | 2 | 1 | 4 | 4442 | 2 | 3 | 5 | 1 | 5 | 3 | 3 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 1 | 4 | 4 | 2 | 3 | 4 | 4 | 4 | 2 | 2 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | 4 | 3 | 2 | 3 | 3 | 3 | 4 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 3 | 3 | 3 | 1 | 1 | 1 | 4 | 2 | 5 | 4 | 2 | 4 | 4 | 2 | 4 | 5 | 5 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | NA | NA | NA | NA | NA | NA | NA | 1 | NA | NA | NA | NA | 1 | 1 | 3 | NA | NA | 1 | NA | NA | NA | 1 | 1 | 1 | 1 | 13 | 8 | 3 | 2 | NA | 1 | 2 | 3 | 3 | 5 | 3 | 3 | 2 | 3 | 2 | 1 | 5 | 2 | NA | 4 | 0 | 4 | 3 | 1 | 2 | NA | NA | NA | 5 | 1 | 3 | NA | NA | NA | NA | 2 | NA | 1 | 1 | 3 | 2 | 3 | 33 | 3 | 4 | 2 | 2 | 1 | 3 | 3 | 5 | 5 | 3 | 3 | 5 | 3 | 5 | 3 | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 3 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | 3 | 5 | 4 | 1 | 1 | 3 | 1 | 1 | 5 | 1 | 1 | 4 | 1 | 3 | 3 | 1 | 3 | 3 | 1 | 5 | 3 | 3 | 4 | 3 | 2 | 4 | 1 | 1 | 3 | 4 | 5 | 5 | 2 | 1 | 1 | 4 | 4 | 3 | 1 | 2 | 3 | 2 | 1 | 2 | 1 | 4 | 1 | 4 | 2 | 1 | 2 | 4 | 5 | 1 | 2 | 1 | 2 | 5 | 1 | 1 | 5 | 5 | 3 | 5 | 1 | 5 | 5 | 4 | 1 | 5 | 2 | 5 | 5 | 2 | 5 | 5 | 3 | 4 | 3 | 3 | 5 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 3 | 2 | 2 | 2 | 1 | 4 | 3 | 4 | 3 | 2 | 2 | 3 | 4 | 4 | 3 | 3 | 1 | 2 | 3 | 3 | 2 | 1 | 1 | 1 | 1 | 3 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 3 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 5 | 3 | 2 | 2 | 4 | 3 | 0 | 1 | 2 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 3 | 2 | 1 | NA | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | 1 | 1 | NA | NA | NA | 1 | 1 | NA | NA | NA | NA | 1 | 2 | 1 | 2 | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 3 | 12 | 1 |
2 | 1 | 2 | 2 | 4 | 4 | 4 | 5 | 6 | 1 | 1 | 7 | 73 | 1 | 3 | 3 | 1 | 2 | NA | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NA | 1 | NA | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 3 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | 5 | 5 | 1 | 1 | 4 | 5 | 1 | 1 | 4 | 4469 | 2 | 5 | 5 | 5 | 3 | 3 | 3 | 5 | 3 | 3 | 1 | 5 | 4 | 4 | 2 | 4 | 4 | 2 | 4 | 4 | 1 | 4 | 2 | 2 | 2 | 4 | 4 | 4 | 1 | 5 | 1 | 5 | 5 | 1 | 2 | 4 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 2 | 4 | 4 | 2 | 4 | 4 | 2 | 4 | 5 | 1 | 4 | 4 | 4 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 | 1 | 15 | 10 | 4 | 2 | NA | 5 | 5 | 4 | 3 | 4 | 4 | 3 | 2 | 3 | 3 | 3 | 5 | 1 | 3 | NA | NA | NA | NA | 1 | 3 | NA | NA | NA | 5 | 1 | 2 | NA | NA | NA | NA | 2 | NA | 1 | 1 | 5 | 5 | 4 | 55 | 5 | 4 | 3 | 1 | 1 | 5 | 3 | 3 | 3 | 3 | 3 | 5 | 3 | 5 | 3 | 3 | 1 | 5 | 5 | 5 | 3 | 3 | 3 | 5 | 3 | 5 | 3 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 3 | 3 | 3 | 3 | 3 | 4 | 2 | 2 | 2 | 2 | 2 | 5 | 4 | 2 | 2 | 4 | 2 | 3 | 4 | 3 | 1 | 4 | 2 | 4 | 2 | 2 | 3 | 2 | 2 | 4 | 2 | 2 | 2 | 2 | 2 | 4 | 2 | 1 | 2 | 4 | 4 | 1 | 2 | 2 | 2 | 4 | 4 | 1 | 1 | 2 | 4 | 1 | 2 | 2 | 2 | 4 | 1 | 4 | 1 | 2 | 2 | 4 | 4 | 2 | 1 | 2 | 2 | 4 | 1 | 2 | 4 | 5 | 2 | 5 | 2 | 5 | 5 | 5 | 2 | 5 | 1 | 5 | 5 | 2 | 4 | 4 | 2 | 5 | 2 | 5 | 4 | 4 | 2 | 4 | 2 | 5 | 5 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | 1 | NA | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | 2 | 1 | 3 | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 3 | 11 | 2 |
3 | 1 | 2 | 2 | 4 | 4 | 4 | 5 | 6 | 1 | 1 | 7 | 73 | 1 | 3 | 3 | 1 | 2 | NA | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NA | 1 | NA | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 4 | 3 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | 5 | 5 | 1 | 1 | 4 | 5 | 1 | 1 | 4 | 4469 | 3 | 3 | 5 | 1 | 5 | 3 | 3 | 3 | 1 | 3 | 3 | 3 | 5 | 2 | 1 | 5 | 2 | 2 | 5 | 1 | 1 | 5 | 1 | 1 | 2 | 5 | 5 | 4 | 1 | 3 | 1 | 4 | 4 | 1 | 2 | 4 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 2 | 1 | 3 | 4 | 3 | 5 | 5 | 1 | 5 | 5 | 2 | 5 | 5 | 5 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 2 | 1 | 3 | NA | NA | NA | NA | NA | NA | 1 | 1 | 2 | 1 | 11 | 6 | 5 | 2 | NA | 4 | 0 | 4 | 4 | 5 | 1 | 5 | 2 | 4 | 3 | 2 | 5 | 1 | 3 | NA | NA | NA | NA | 1 | 5 | NA | NA | NA | 4 | 1 | 5 | NA | NA | NA | NA | 1 | 3 | NA | 1 | 3 | 5 | 5 | 54 | 4 | 3 | 4 | 2 | 1 | 5 | 5 | 3 | 3 | 1 | NA | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 3 | 3 | 5 | 5 | 5 | 5 | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 3 | 5 | 3 | 3 | 5 | 4 | 5 | 2 | 2 | 2 | 5 | 4 | 2 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 5 | 4 | 2 | 4 | 4 | 4 | 3 | 2 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 4 | 2 | 1 | 2 | 3 | 2 | 1 | 4 | 2 | 2 | 4 | 5 | 4 | 4 | 2 | 3 | 2 | 4 | 1 | 2 | 2 | 4 | 5 | 4 | 4 | 4 | 4 | 4 | 2 | 5 | 4 | 4 | 4 | 4 | 2 | 4 | 2 | 5 | 4 | 4 | 4 | 4 | 2 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 4 | 4 | 2 | 2 | 3 | 2 | 3 | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 12 | 4 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | NA | 1 | NA | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 4 | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 3 | 11 | 2 |
4 | 3 | 2 | 2 | 4 | 4 | 5 | 6 | 6 | 1 | 1 | 6 | 66 | 5 | NA | 4 | 1 | 1 | 0 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NA | 1 | NA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 4 | 4 | 2 | 4 | 2 | 4415 | 2 | 3 | 5 | 3 | 3 | 5 | 3 | 5 | 1 | 5 | 5 | 3 | 4 | 2 | 2 | 4 | 3 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 | 1 | 1 | 3 | 2 | 5 | 5 | 4 | 5 | 5 | 2 | 5 | 5 | 5 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 1 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 2 | 1 | 2 | 1 | 1 | NA | NA | NA | NA | 1 | 1 | 1 | 2 | 16 | 11 | 3 | 2 | NA | 5 | 4 | 5 | 3 | 5 | 2 | 3 | 2 | 3 | 2 | 2 | 3 | 2 | NA | 5 | 2 | 3 | 3 | 1 | 4 | NA | NA | NA | 3 | 1 | 3 | NA | NA | NA | NA | 2 | NA | 1 | 1 | 3 | 2 | 3 | 44 | 4 | 2 | 2 | 1 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 5 | 3 | 5 | 5 | 3 | 5 | 3 | 1 | 3 | 1 | 3 | 3 | 5 | 5 | 3 | 3 | 5 | 3 | 5 | 3 | 3 | 3 | 2 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 5 | 5 | 4 | 2 | 4 | 2 | 4 | 4 | 1 | 1 | 4 | 3 | 4 | 2 | 1 | 2 | 2 | 2 | 5 | 4 | 4 | 4 | 4 | 3 | 4 | 3 | 2 | 3 | 3 | 4 | 2 | 3 | 3 | 2 | 3 | 4 | 2 | 3 | 2 | 4 | 3 | 4 | 2 | 2 | 3 | 4 | 4 | 4 | 3 | 2 | 3 | 4 | 3 | 4 | 2 | 4 | 3 | 3 | 2 | 4 | 4 | 2 | 5 | 3 | 4 | 5 | 4 | 2 | 4 | 2 | 5 | 5 | 2 | 3 | 3 | 2 | 3 | 2 | 2 | 3 | 3 | 3 | 2 | 2 | 2 | 3 | 3 | 2 | 4 | 3 | 1 | 4 | 2 | 2 | 3 | 4 | 3 | 4 | 2 | 2 | 3 | 1 | 2 | 2 | 1 | 3 | 4 | 3 | 4 | 3 | 4 | 3 | 3 | 4 | 2 | 2 | 2 | 3 | 1 | 3 | 2 | 0 | 1 | 2 | 2 | 2 | 2 | 0 | 1 | 1 | 2 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 2 | 0 | 1 | 2 | 2 | 3 | 2 | 4 | 3 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 3 | 2 | 2 | 2 | 0 | 1 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 6 | 3 | 1 | 2 | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 2 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 2 | 0 | 1 | 0 | 0 | 0 | 5 | 3 | 7 | 3 | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | 1 | NA | NA | NA | NA | 1 | 1 | 1 | 3 | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 3 | 12 | 1 |
5 | 1 | 2 | 3 | 4 | 3 | 3 | 6 | 5 | 4 | 1 | 6 | 66 | NA | NA | 1 | 2 | 3 | 0 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NA | 2 | NA | 2 | 2 | 1 | 1 | 1 | 1 | 3 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 4 | 3 | 2 | 2 | 4 | 4 | 4 | 2 | 4 | 4409 | 2 | 5 | 5 | 1 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 1 | 4 | 2 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 4 | 2 | 2 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 3 | 4 | 4 | 4 | 2 | 3 | 4 | 2 | 4 | 4 | 2 | 5 | 4 | 2 | 4 | 4 | 4 | 2 | NA | 1 | NA | 2 | NA | 2 | NA | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 1 | 1 | 3 | NA | NA | 1 | NA | NA | NA | 1 | 2 | 2 | 1 | 14 | 9 | 4 | 1 | 4 | NA | NA | NA | NA | NA | 0 | 4 | 2 | 1 | 5 | 2 | 4 | 1 | 2 | NA | NA | NA | NA | 2 | 4 | 1 | 0 | 1 | 3 | 1 | 2 | NA | NA | NA | NA | 2 | NA | 1 | 1 | 0 | 3 | 4 | 43 | 3 | 2 | 2 | 2 | 1 | 3 | 3 | 1 | NA | 3 | 3 | 5 | 3 | 5 | 3 | 5 | 5 | 5 | 3 | 5 | 5 | 1 | NA | 5 | 3 | 3 | 3 | 5 | 5 | 5 | 3 | 5 | 3 | 5 | 3 | 1 | 1 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 4 | 4 | 2 | 2 | 4 | 2 | NA | 4 | 2 | 2 | 4 | 3 | 4 | 2 | 2 | 3 | 2 | 3 | 4 | 2 | 4 | 2 | 2 | 2 | 5 | 2 | 2 | 2 | 4 | 4 | 2 | 2 | 2 | 3 | 4 | 5 | 1 | 2 | 2 | 4 | 2 | 3 | 2 | 3 | 4 | 3 | 4 | 2 | 2 | 2 | 4 | 4 | 2 | 2 | 2 | 2 | 4 | 2 | 2 | 4 | 4 | 2 | 5 | 2 | 4 | 4 | 4 | 2 | 3 | 2 | 4 | 5 | 2 | 3 | 3 | 2 | 4 | 3 | 4 | 4 | 4 | 2 | 2 | 2 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 4 | 4 | 4 | 4 | 1 | 1 | 1 | 2 | 3 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 9 | 2 | NA | NA | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 5 | 3 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 2 | 2 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 10 | 4 | 0 | 1 | 0 | 1 | 0 | 1 | 3 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 3 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 2 | 1 | 2 | 3 | 2 | 0 | 1 | 5 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 1 | 1 | NA | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | 1 | NA | NA | NA | NA | NA | 1 | 1 | 1 | 4 | NA | NA | 1 | NA | NA | NA | 1 | 2 | 0 | 4 | 10 | 2 |
6 | NA | NA | NA | NA | NA | NA | NA | NA | 9 | NA | NA | 66 | NA | NA | NA | NA | 9 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4409 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 | 1 | 11 | 6 | 3 | 2 | NA | 5 | 3 | 3 | 3 | 4 | 2 | 3 | 2 | 4 | 2 | 2 | 4 | 1 | 5 | NA | NA | NA | NA | 1 | 4 | NA | NA | NA | 4 | 2 | NA | 5 | 0 | 5 | 5 | 1 | 3 | NA | 1 | 3 | 2 | 3 | 44 | 4 | 2 | 4 | 2 | 1 | 3 | 5 | 5 | 3 | 1 | NA | 5 | 3 | 5 | 3 | 5 | 1 | 5 | 3 | 1 | NA | 3 | 1 | 5 | 3 | 5 | 3 | 3 | 3 | 5 | 3 | 5 | 5 | 5 | 3 | 1 | 3 | 2 | 2 | 2 | NA | 2 | 2 | 1 | NA | 4 | 4 | 2 | 4 | 4 | 1 | 2 | 5 | NA | 2 | 4 | 4 | 5 | 2 | 2 | 4 | 2 | 2 | 4 | 1 | 2 | 4 | 2 | 2 | 4 | 2 | 1 | 2 | 4 | 4 | 2 | 2 | 2 | 2 | 4 | 4 | 2 | 1 | 2 | 4 | 2 | 2 | 1 | 2 | 4 | 2 | 4 | 2 | 2 | 2 | 4 | 5 | 2 | 2 | 1 | 2 | 2 | 1 | 2 | 4 | 1 | 2 | 5 | 1 | 5 | 5 | 4 | 2 | 5 | 2 | 4 | 1 | 2 | 4 | 2 | 2 | 4 | 2 | 4 | 5 | 4 | 2 | 4 | 2 | 5 | 4 | 3 | 3 | 2 | 3 | 3 | 2 | 2 | 3 | 2 | 3 | 4 | 2 | 2 | 4 | 3 | 4 | 2 | 3 | 4 | 3 | 4 | 4 | 4 | 2 | 2 | 3 | 1 | 1 | 2 | 1 | 1 | 2 | 1 | 3 | 3 | 2 | 2 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 2 | 2 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 2 | 2 | 1 | 2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 2 | 3 | 1 | NA | 1 | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | 1 | NA | NA | NA | NA | NA | 1 | 1 | NA | NA | 1 | NA | 1 | 2 | 1 | 4 | NA | NA | NA | NA | NA | NA | 3 | 2 | 0 | 4 | 10 | 2 |
\(~\) Instead of descriptive variable names we get a bunch of columns with variable numbers in the format “V###” (note: if you open up the actual dataset in R you will also see short descriptive variable labels). In order to find the specific survey items Warr (1993) is referring to, we need to go to the codebook and identify the variable numbers.
Note: Because variable numbers are not the same across each wave, this requires going to each of the codebooks and looking them up.
Note: In the above code we use the
gt()
function to simply have the data print out in a nice
html formatted table. We will show you how to harness the power of the
“gt” package to create
publishable-ready tables in a later R Assignment.
Fortunately, the NYS codebooks make this relatively easy as they include bookmarks to different sections of the survey. Here is how it looks in the Wave 1 codebook:
Notice on the left that wave 1 included both a “parent interview” and a “youth interview.” If you go to wave 1 and look through the bookmarks, you’ll notice multiple sections of peer measures, including a “exposure to delinquent peers” and “commitment to delinquent peers.” However, the particular questions for Warr’s (1993) figures 2-3 are in the “Social Integration” section while the dependent variable for Figure 4 is in the “commitment to delinquent peers” section.
For wave 1, the key variables we need are:
Since Warr (1993) used the first five waves, to figure out each specific variable used to construct Figures 2-4, you would simply need to go to each codebook and find each of these items. Fortunately for you, we already did this and made this handy table. You’re welcome!
Warr (1993) Figures 2-4 NYS Items | |||||
---|---|---|---|---|---|
Item | Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 |
ICPSR number1 | 8375 | 8424 | 8506 | 8917 | 9112 |
Age | V169 | V7 | V10 | V6 | V6 |
Evenings spent socializing | V179 | V17 | V81 | V24 | V37 |
Importance of socializing | V180 | V18 | V82 | V25 | V38 |
Lie to police | V377 | V223 | V321 | V301 | V328 |
1
Note: indicates the icpsr number for the data set and not a survey item
|
\(~\)
When working on a specific analysis or set of analyses from a large data set, I generally think it’s good practice to create a more manageable data set with just the items you need. This ensures that your raw data is kept intact and that you do not unintentionally make changes to it. In this particular case, it will also allow me to look at the data and see if the changes I am making to it are working (this is not always possible with analyses that utilize many variables).
Let’s start selecting the specific variables we need to reproduce
Warr’s (1993) Figures 2-4. We are going to use the select()
function in the “dplyr”
package (one of the core packages within the tidyverse suite) to select
those specific items in each of the five waves of data. That means, for
the wave 1 data, we need to select “V169,” “V179,” “V180,” and “V377.”
(“V7,” “V17,” “V18,” and “V223” for wave 2, and so on for waves 3
through 5).
nys_w1_trim <- nys_w1 %>%
dplyr::select(V169, V179, V180, V377)
In the code above, we are telling R to select “V169,” “V179,” “V180,” and “V377” from the “nys_w1” data object and create a new data set object called “nys_w1_trim”. Our new object has the same number of observations, but only 4 variables. You can check this by looking in your RStudio “Environment.”
select()
is one of
those popular commands that frequently poses conflicts when you have
several packages loaded at once. So, in this code chunk, we ensured that
the select()
command was invoked using the
“dplyr” package by appending the package name followed
by two colons directly in front of the command (i.e.,
dplyr::select()
).Let’s take a look at the first six observations of the data with teh
head()
function and see what the trimmed data look
like.
head(nys_w1_trim) %>%
gt()
V169 | V179 | V180 | V377 |
---|---|---|---|
13 | 3 | 3 | 1 |
15 | 4 | 3 | 3 |
11 | 1 | 5 | 1 |
16 | 2 | 3 | 2 |
14 | 0 | 4 | NA |
11 | 2 | 3 | 3 |
There are a few problems with the wave 1 trimmed data in its current form:
First, the variable names are not informative. They are simply the names in the original data file. Like with naming your computer files, it is usually a good practice to give informative names to your variables (and other R objects; see part 1 of Navarro’s series on “dplyr”). Using meaningful and systematic naming conventions will also be useful when we combine the data sets, since we can assign the same name across each wave before merging or combining them.
Second, the trimmed data we created has no information about which wave these data come from (except in the object name) nor does it include the unique identifier for individuals. If we were just working with the wave 1 data, this would not be a huge problem; also, since Warr (1993) simply pooled all five waves of data, the individual identifiers are less important. Nonetheless, it is generally good practice to preserve such important information.
In order to rename the five variables in which we are currently
interested (i.e., age, evenings spent socializing,
importance of socializing, and lie to police) we will
use the rename()
function. To create a new variable
indicating the wave of the data, we will use the mutate()
function. Both of these functions are part of the “dplyr” package. The
rename()
function does exactly what it says - it renames
existing items (or columns) in a data set, whereas the
mutate()
function allows us to create new variables (or
columns) and to manipulate existing items in the data.
Let’s do this with the wave 1 data again.
nys_w1_trim
data that we created earlier. Usually, we avoid
writing over objects, as doing so can cause confusion and errors as we
try to keep track of what exactly is in the object. Nonetheless, as you
will see, we are going to repeat the select()
function that
we used before, though, this time, that command will be followed by a
pipe and additional commands using rename
and
mutate
functions. One of the nice things about the “dplyr”
package and the “pipe” (%>%
) you learned about in a
previous R Assignment is that you sequentially invoke various commands
all at once within the same code chunk.nys_w1_trim <- nys_w1 %>%
dplyr::select(CASEID, V169, V179, V180, V377) %>%
rename(age = V169,
evsoc = V179,
socimp = V180,
liepolice = V377) %>%
mutate(wave = 1)
A couple things to note about the above code:
First, like before, we are telling R to select “V169,” “V179,”
“V180,” and “V377” from “nys_w1.” However, one key difference this time
is that we immediately followed this with a pipe to a sequential
rename
command, which tells R to subsequently rename
specific items after selecting them. Within the rename
command, we specifically tell R what to rename each variable by invoking
a new name as equal to an old name (i.e. new name = old name).
Second, the rename
command is then followed by
another pipe to a sequential mutate
command that tells R to
create or modify a variable after completing the rename
command. In this case, our mutate
command tells R to create
a new variable named “wave” that equals “1” to indicate the wave of the
data we are working with (i.e. variable_name = value). Since this
command is not conditional (e.g., there is no ifelse
operator), all 1,725 rows or observations (i.e., “cases” or
“respondents”) from NYS wave 1 will include a variable column named
“wave” with a value that equals “1” in each row. And, of course, the
first line of code tells R to assign all of these operations into a new
object called “nys_w1_trim.”
Note: We also included the “CASEID”
item in the select
command above; ICPSR and the NYS made it
easy on us by consistently naming the identifier “CASEID” in each wave
of data.
Note: The mutate()
function can do a lot more than just assign a value to a new variable.
We’ll discuss this more in the next R Assignment.
Here is what the data look like:
head(nys_w1_trim) %>%
gt()
CASEID | age | evsoc | socimp | liepolice | wave |
---|---|---|---|---|---|
1 | 13 | 3 | 3 | 1 | 1 |
2 | 15 | 4 | 3 | 3 | 1 |
3 | 11 | 1 | 5 | 1 | 1 |
4 | 16 | 2 | 3 | 2 | 1 |
5 | 14 | 0 | 4 | NA | 1 |
6 | 11 | 2 | 3 | 3 | 1 |
\(~\) Now, let’s create the trimmed
data for each of the first five waves of the NYS. Again, we will write
over the nys_w1_trim
data we created above, as we would
usually do all of these commands in the same code chunk. Here is the
table of the items as a reminder:
Warr (1993) Figures 2-4 NYS Items | ||||||
---|---|---|---|---|---|---|
Item | Variable name | Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 |
ICPSR number1 | 8375 | 8424 | 8506 | 8917 | 9112 | |
Age | age | V169 | V7 | V10 | V6 | V6 |
Evenings spent socializing | evsoc | V179 | V17 | V81 | V24 | V37 |
Importance of socializing | socimp | V180 | V18 | V82 | V25 | V38 |
Lie to police | liepolice | V377 | V223 | V321 | V301 | V328 |
1
Note: indicates the icpsr number for the data set and not a survey item
|
\(~\)
Here is the code to create five trimmed data objects corresponding to each of the first five waves of NYs data.
#Wave 1:
nys_w1_trim <- nys_w1 %>%
dplyr::select(CASEID, V169, V179, V180, V377) %>%
rename(age = V169,
evsoc = V179,
socimp = V180,
liepolice = V377) %>%
mutate(wave = 1)
head(nys_w1_trim) %>%
gt()
#Wave 2:
nys_w2_trim <- nys_w2 %>%
dplyr::select(CASEID, V7, V17, V18, V223) %>%
rename(age = V7,
evsoc = V17,
socimp = V18,
liepolice = V223) %>%
mutate(wave = 2)
head(nys_w2_trim) %>%
gt()
#Wave 3:
nys_w3_trim <- nys_w3 %>%
dplyr::select(CASEID, V10, V81, V82, V321) %>%
rename(age = V10,
evsoc = V81,
socimp = V82,
liepolice = V321) %>%
mutate(wave = 3)
head(nys_w3_trim) %>%
gt()
#Wave 4:
nys_w4_trim <- nys_w4 %>%
dplyr::select(CASEID, V6, V24, V25, V301) %>%
rename(age = V6,
evsoc = V24,
socimp = V25,
liepolice = V301) %>%
mutate(wave = 4)
head(nys_w4_trim) %>%
gt()
#Wave 5:
nys_w5_trim <- nys_w5 %>%
dplyr::select(CASEID, V6, V37, V38, V328) %>%
rename(age = V6,
evsoc = V37,
socimp = V38,
liepolice = V328) %>%
mutate(wave = 5)
head(nys_w5_trim) %>%
gt()
In the code above, we simply kept the same basic code that we used for wave 1 earlier, but replaced the item information with the appropriate items for each wave that we identified previously for the other four waves of NYS data. Now, you should have five separate “trimmed” data objects each with the same seven variables in them (“CASEID,” “age,” “marijuana,” “alcohol,” “cheating,” “vandalism,” and “wave”).
Now that we have all five data sets with the same variables and variable names, pooling them together is relatively easy. But first, we want you to try to build some intuition regarding what Warr (1993) did here when he says that “…all five years of the NYS data were pooled, producing a composite sample of 8,625 persons aged 11-21 (pg. 20).”
Because we created five trimmed data sets with the same variables in
section 3.3 above, “pooling” the data in this case really just means
stacking the waves of data on top of each other. In other words, I want
to put Wave 1 data on top, Wave 2 data next, then Wave 3 data, then Wave
4 data, until the Wave 5 observations are at the bottom of the data set.
Fortunately, the “dplyr” package makes this relatively easy with the bind_rows()
function. Essentially, when you use the bind_rows()
function, you are simply telling R which data sets to stack on top of
each other by the order in which you list them.
bind_cols()
that tells R to place (columns of) data sets
next to each other.nys_fwtrim <- bind_rows(nys_w1_trim, nys_w2_trim, nys_w3_trim, nys_w4_trim, nys_w5_trim)
head(nys_fwtrim) %>%
gt()
In the code above, we told R to stack waves 1 through 5 on top of each other in chronological order and assign it to the object “nys_fwtrim” (we used fw to indicate we were creating a data set that included all “five waves” of data).
You should now have a pooled data set called “nys_fwtrim” that has 8,625 observations and six variables that have informative names. An important first step in analyzing data is looking at basic descriptives for your key variables. This is often the first step in identifying the basic distribution of variables, identifying outliers, and identifying potential problems in the data (e.g., missing data).
If you look again at Figures 2-4, you will notice that Warr (1993) reports, by age, the “Percentage of Respondents…” who:
Each of these variables were dichotomized from specific survey questions with more than two response categories. One reason for this may be because the data are fairly skewed, or concentrated in certain answer categories. For example, it is likely pretty rare for teenage respondents to report averaging seven nights a week socializing via things like dates and parties. Of course, we’ll be able to confirm this below.
We can check the frequency distributions and modal categories for
each of the three peer items creating a frequency table for them. There
are lots of ways to create frequency tables and calculate and produce
tables of descriptive statistics (see here for
review), including the base R command table()
, the more
tidyverse-oriented command, tabyl()
, that is part of the
“janitor”
package, and the frq()
and flat_table()
commands in the “sjmisc
package.”
Note: If you use the base R command,
you’ll need to use the $
operator to tell R which variable
to use from the data set—table(nys_fwtrim$marijuana)
. For
now, we’ll use the “sjmisc” package.
Note: we also include the frequency table for age as well.
library(sjmisc)
nys_fwtrim %>%
frq(age, evsoc, socimp, liepolice)
##
## age <numeric>
## # total N=8625 valid N=8625 mean=15.87 sd=2.40
##
## Value | N | Raw % | Valid % | Cum. %
## ---------------------------------------
## 11 | 252 | 2.92 | 2.92 | 2.92
## 12 | 509 | 5.90 | 5.90 | 8.82
## 13 | 778 | 9.02 | 9.02 | 17.84
## 14 | 1036 | 12.01 | 12.01 | 29.86
## 15 | 1289 | 14.94 | 14.94 | 44.80
## 16 | 1276 | 14.79 | 14.79 | 59.59
## 17 | 1216 | 14.10 | 14.10 | 73.69
## 18 | 947 | 10.98 | 10.98 | 84.67
## 19 | 689 | 7.99 | 7.99 | 92.66
## 20 | 436 | 5.06 | 5.06 | 97.72
## 21 | 197 | 2.28 | 2.28 | 100.00
## <NA> | 0 | 0.00 | <NA> | <NA>
##
##
## Y5-37:EVENINGS/WK SPENT DATING/SOCIAL (evsoc) <numeric>
## # total N=8625 valid N=8029 mean=2.08 sd=1.57
##
## Value | Label | N | Raw % | Valid % | Cum. %
## -------------------------------------------------------------
## 0 | Less than once a wk | 1288 | 14.93 | 16.04 | 16.04
## 1 | 1 | 1899 | 22.02 | 23.65 | 39.69
## 2 | 2 | 2068 | 23.98 | 25.76 | 65.45
## 3 | 3 | 1463 | 16.96 | 18.22 | 83.67
## 4 | 4 | 677 | 7.85 | 8.43 | 92.10
## 5 | 5 | 386 | 4.48 | 4.81 | 96.91
## 6 | 6 | 111 | 1.29 | 1.38 | 98.29
## 7 | 7 | 137 | 1.59 | 1.71 | 100.00
## <NA> | <NA> | 596 | 6.91 | <NA> | <NA>
##
##
## Y1-15: HOW IMPORTANT SOCIAL (socimp) <numeric>
## # total N=8625 valid N=8028 mean=3.31 sd=1.13
##
## Value | Label | N | Raw % | Valid % | Cum. %
## ------------------------------------------------------------
## 1 | Not important | 474 | 5.50 | 5.90 | 5.90
## 2 | Not too important | 1460 | 16.93 | 18.19 | 24.09
## 3 | Somewhat important | 2543 | 29.48 | 31.68 | 55.77
## 4 | Pretty important | 2176 | 25.23 | 27.11 | 82.87
## 5 | Very important | 1375 | 15.94 | 17.13 | 100.00
## <NA> | <NA> | 597 | 6.92 | <NA> | <NA>
##
##
## Y1-191: WILLING TO LIE (liepolice) <numeric>
## # total N=8625 valid N=7638 mean=1.59 sd=0.77
##
## Value | Label | N | Raw % | Valid % | Cum. %
## ----------------------------------------------------
## 1 | No | 4469 | 51.81 | 58.51 | 58.51
## 2 | Don't know | 1830 | 21.22 | 23.96 | 82.47
## 3 | Yes | 1338 | 15.51 | 17.52 | 99.99
## 4 | 4 | 1 | 0.01 | 0.01 | 100.00
## <NA> | <NA> | 987 | 11.44 | <NA> | <NA>
As you can see above, the frq()
command in the “sjmisc”
package prints out some pretty bare bones tables of the frequency
distributions for our key variables. One thing we particularly like
about this command is it allows us to include all of the variables for
which we want frequency tables in the same command. Thus, it is a good
command for getting a quick look at the key variables in your data.
We also like that the frq()
command defaults to printing
out the the frequency of missing observations for each variable (the
“
Note: One thing we do
not like about the frq()
command in the
“sjmisc” package is that taking the outputted tables above and
converting them to a tidy data frame (i.e., tibble) for using with
tidyverse packages like “ggplot2” is not as easy as with other
commands.
Assignment: Take some time to look over the distributions for our key variables above. What do you notice about each variable’s distribution (e.g., where do most respondents fall in the distribution, what is the most common answer across the five waves, which items have more missing data, etc.)? From the above tables, do you notice any potential problems with the items (e.g., values that don’t make sense based on description of variable in the codebook)? Create a sub-header in your RMD file and write out your responses to these questions.
Warr (1993) was fundamentally interested in the age distribution of
these “Other elements of peer relations.” Essentially, Figures 2 through
4 are simply presenting cross-tabulations of Age by dichotomized
versions of the variables for which we just looked at frequency tables.
Although we’ll save dichotomizing the variables for the next assignment,
we can produce basic cross-tabulation for age and the non-dichotomized
versions of the variables relatively easily using the flat_table()
function in the “sjmisc” package.
nys_fwtrim %>%
flat_table(evsoc, age, margin = "col") #note: margin = "col" tells it to give me column percentages
## age 11 12 13 14 15 16 17 18 19 20 21
## evsoc
## Less than once a wk 40.80 41.25 31.36 23.09 15.36 10.52 7.09 4.88 6.47 6.32 10.30
## 1 28.80 31.79 30.70 29.72 25.10 22.45 17.80 15.81 19.24 19.74 25.45
## 2 18.00 14.89 21.74 22.19 26.67 27.96 30.82 27.11 26.87 30.00 32.12
## 3 8.40 7.24 8.83 14.36 18.66 19.20 22.76 25.68 23.05 24.47 21.21
## 4 2.40 3.22 3.69 5.82 8.01 10.60 10.72 11.89 12.77 10.26 4.85
## 5 0.80 1.41 2.50 3.31 3.22 6.43 6.38 8.44 6.30 5.79 3.64
## 6 0.00 0.00 0.79 0.60 0.99 1.00 2.13 3.09 2.82 1.84 0.61
## 7 0.80 0.20 0.40 0.90 1.98 1.84 2.30 3.09 2.49 1.58 1.82
nys_fwtrim %>%
flat_table(socimp, age, margin = "col")
## age 11 12 13 14 15 16 17 18 19 20 21
## socimp
## Not important 12.40 13.28 9.59 7.43 4.71 4.59 2.92 3.69 3.98 5.00 6.67
## Not too important 24.00 26.76 24.05 19.98 17.84 17.46 15.32 12.75 15.75 13.68 20.00
## Somewhat important 25.60 25.55 30.22 31.73 32.62 31.41 28.52 34.56 37.48 36.32 35.76
## Pretty important 25.20 22.33 22.08 25.20 27.09 29.16 31.53 29.32 26.04 28.16 24.24
## Very important 12.80 12.07 14.06 15.66 17.75 17.38 21.70 19.67 16.75 16.84 13.33
nys_fwtrim %>%
flat_table(liepolice, age, margin = "col")
## age 11 12 13 14 15 16 17 18 19 20 21
## liepolice
## No 74.77 69.80 63.95 61.06 56.93 55.58 53.12 56.85 53.18 58.31 63.80
## Don't know 15.89 20.81 24.32 22.87 24.49 23.79 26.28 22.74 26.76 25.33 22.70
## Yes 9.35 9.40 11.59 16.06 18.58 20.63 20.60 20.42 20.07 16.36 13.50
## 4 0.00 0.00 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Above we produced three basic tables with our three “other elements
of peer relations” items representing the rows and age representing the
columns. We used the margins = "col"
argument in the
flat_table()
function to get the percentage of each
response category for each age group across all five waves of data.
Again, this is fundamentally what Figures 2-4 in Warr’s (1993) paper are
doing, only here we’re showing this for the raw/untransformed items
rather than the dichotomized versions that Warr (1993) created.
r
8.4 + 2.4 + 0.8 + 0 + 0.8`. This appears
to correspond to the value at age 11 in Warr’s (1993) Figure 2
plot.In order for you to demonstrate that you can apply the basic data wrangling and descriptive analysis skills that you learned above on your own, in the last part of the assignment, you will consider alternative operationalizations of one of the “other elements of peer relations” that Warr (1993) was examining in Figures 2-4. In doing so, you will provide a type of robustnes check to one of Warr’s (1993) methodological decisions.
Specifically, Warr (1993) used the question about being willing to “lie to protect their friends if they got in trouble with the police” as an indicator of respondents’ “commitment or loyalty to their own particular set of friends (pg. 19).” However, in the section on “Committment to Delinquent Peers” in the codebooks for the first five waves of NYS data, their are two other questions that are meant to measure “commitment” to peers who are engaging in delinquency:
These were in addition to the question Warr (1993) examined:
Here is a table, similar to what I provided above, that shows you where each item is located in each of the first five waves of NYS data:
Warr (1993) Figures 2-4 NYS Items | |||||
---|---|---|---|---|---|
Item | Wave 1 | Wave 2 | Wave 3 | Wave 4 | Wave 5 |
ICPSR number1 | 8375 | 8424 | 8506 | 8917 | 9112 |
Age | V169 | V7 | V10 | V6 | V6 |
Still run around with friends | V375 | V221 | V319 | V299 | V326 |
Try to stop activities | V376 | V222 | V320 | V300 | V327 |
Lie to police | V377 | V223 | V321 | V301 | V328 |
1
Note: indicates the icpsr number for the data set and not a survey item
|
In order to complete the assignment, here is what you need to do:
Before looking at the data, write a brief statement or commentary about whether you think the other two “commitment to delinquent peers” items will have a similar age distribution to the “lie to police” item for which you already produced the descriptive table.
Trim, rename, and pool waves 1-5 data so that you have all three “commitment to delinquent peers” items in the same pooled data set.
Produce frequency tables for each o the “commitment to delinquent peers” items as well as cross-tabulations for these items by age.
Write a brief statement or commentary about the similarities and differences between each of the “commitment to delinquent peers” items in terms of their raw frequency distribution and their age distribution.
Write a “Conclusion” section where you write about what you learned in this assignment and any problems or issues you had in completing it.
“knit” your final RMD file to html format and save it using an informative file name (e.g., “LastName_CRM495_RAssgin5_YEAR_MO_DY”) within a file structure you create for this assignment (e.g., “LastName_CRM495_RAssign5”)
Submit your knitted html file on Canvas.
Place a copy of your root folder your LastName_495_commit folder on OneDrive.