The purpose of this assignment is to learn how to download and provide basic descriptions of data and specific variables analyzed in a published study. Up to this point, we have used built-in R data (e.g., R Assignment 2), provided you with the data you were working with (e.g., subsets of the NYS data in R Assignment 3), or you have downloaded the data manually from ICPSR and placed it within your reproducible file structure (e.g., R Assignment 4). The approach we used in the last R Assignment would work fine when you are using your own data and/or data that you have permission to share. However, this is not generally the case with data on ICPSR. According to ICPSR’s bylaws, you are not technically allowed to share ICPSR data in your own online repository (e.g. OSF or GitHub). For this assignment, we will show you how to download SPSS data directly within R and begin looking at the data via basic descriptive statistics.
Specifically, for this assignment, we will:
ifelse function and
logic in R.
I assume that you are now familiar with installing and loading packages in R. Thus, when you see a package being used, I expect that you know it needs to be installed and that it needs to be loaded within your own R session in order to use it.
At this point, I also assume you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2.
As with previous assignments, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste from the instructions except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
In the last R Assignment (“Reproducible File structure”) we created a basic reproducible file structure and shared it using your computer’s operating system. Here, we are going to create most of the folders we need using R code. This is useful for our specific purposes–downloading data directly from ICPSR–because we do not have to rely on someone placing their data in the correct folder, we can simply share with them the code to create the folder in their own root directory.
Before we start creating folders and downloading data within R, we need to create a root folder, save our RMD file inside it, and close and open the assignment directly from that root folder (so the “here” package will start in the correct folder on our computer):
Create a subfolder within your “LastName_CRM495_RAssignment5” subfolder called “NYS_data.” Technically, you could do this yourself by navigating to the folder on your computer and creating a new “NYS_data” folder manually. But we can also do it in R with the following code. Again, doing it in the R environment helps ensure that anyone else (including our future selves) can easily reproduce our work with minimal effort.
# check if "NYS_data" folder exists (TRUE if it does) & create if it does not exist.
ifelse(dir.exists(here("NYS_data")), TRUE, dir.create(here("NYS_data")))
##  TRUE
Let us try to explain the above code to you. The “ifelse” command is
a logical function within base R. To get more details about it, type
?ifelse into the console window. Here is the description of
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
It takes the form of the following:
ifelse(test, yes, no). This means, you give R a logical
test (or a logical question) that can be answered
yes or no and then
it gives you a value or performs another function based on the solution
of that test (i.e., based upon the answer to that question).
In the above code, we are asking if the “NYS_data” folder exists
within our root folder (i.e., your “LastName_CRM495_RAssignment5”
folder) with the
dir.exists function. If the answer is
yes, it simply returns the logical value I
told it to - in this case
TRUE. If the answer is
no, you instruct R to create that “NYS_data”
folder with the
dir.create function. Again, type
?dir.create for more
If you want to have some fun, you can actually have R return a text string instead of the logical value. For example:
ifelse(dir.exists(here("NYS_data")), "You already created that folder, dummy!", dir.create(here("NYS_data")))
##  "You already created that folder, dummy!"
Generally, it is probably not a great idea to have R call the user (yourself in this case) a “dummy” with code you plan to eventually share publicly. Yet, it is also OK to have some fun when doing science. I (Jake) think that having a computer program call me a “dummy” is fun - perhaps you do not.
Note: there is probably a programming rationale for using the logical value rather than a string of which I am unaware.
Note: tidyverse syntax has a stricter
According to the documentation, what makes it more strict is that “It
false are the same type.”
I’ll be honest, I’m not sure exactly when this strictness is useful
(tidyverse says it can allow for more predictable use and is somewhat
faster). For most of what we will be using it for, either the base
ifelse and tidyverse
if_else functions will
likely work just fine. I am going to use the more general
ifelse function from base R.
You should now have a file structure for your “LastName_CRM495_RAssignment5” folder that looks like this:
Now that you have the basic file structure for this assignment and specifically the “NYS_data” folder, it’s time to download the first five waves of NYS data. Recall these are the waves of data that Warr (1993) used in his study on the delinquent peer influences and the age-distribution of crime.
As we mentioned previously, it is technically against ICPSR’s bylaws to share data housed on ICPSR “without the written agreement of ICPSR.” This means that if you included ICPSR data and/or documentation directly within a reproducible file structure that you shared with someone else, you would technically be violating the bylaws. Fortunately, there is a package called “icpsrdata” that allows you to download data housed on ICPSR directly from within R. This means you simply need to provide the code for downloading and wrangling the data and you are 1) not violating ICPSR’s bylaws and 2) adhearing to open and reproducible research practices. Let’s show you how to do that now.
We need to know the ICPSR numbers for the first five waves of the NYS. Go to the NYS Series page on ICPSR and make note of the ICPSR numbers for the first five waves of data.