The primary purpose of this first project assignment is to familiarize yourself with the National Youth Survey (NYS) and begin to develop some ideas for research topics or questions that you will eventually examine with a replication study.
At this point, all I assume is that you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2. Prior experience with reviewing a codebook and/or variable information on ICPSR will be beneficial, but not necessary as I will try to walk you through those tasks in this assignment.
The National Youth Survey, overseen by Delbert Elliott, was one of the first national-level longitudinal studies specifically designed to measure self-reported crime and has been a popular data source for many high profile criminological studies. Currently, ICPSR lists 412 total publications. Originally, it was designed to study youth between the ages of 11 and 17 and followed them for a total of five years from 1976 to 1980. Subsequent waves have also been collected at varying intervals (e.g., 1983, 1986, and 1992), and it is now called the National Youth Survey - Family Study (also see here).
ICPSR has the first seven waves of data publicly available and any one or a series of these waves will serve as the data for your replication and reproducibility project for this class.
Here is the general description of the series from ICPSR:
For this series, parents and youth were interviewed about events and behavior of the preceding year to gain a better understanding of both conventional and deviant types of behavior by youths. Data were collected on demographic and socioeconomic status of respondents, disruptive events in the home, neighborhood problems, parental aspirations for youth, labeling, integration of family and peer contexts, attitudes toward deviance in adults and juveniles, parental discipline, community involvement, drug and alcohol use, victimization, pregnancy, depression, use of outpatient services, spouse violence by respondent and partner, and sexual activity. Demographic variables include sex, ethnicity, birth date, age, marital status, and employment of the youths, and information on the marital status and employment of the parents.
In what follows, I will try to walk you through the ICPSR website from where you will ultimately download the data for your project. For the moment, you will only need to access the basic description, variable list, and codebook for the data.
It’s worth familiarizing yourself with the ICPSR webpage for a specific dataset or a series of data sets as it has some pretty standard things that can be useful. Here, we’ll start with the NYS Series page and then review the specifics for Wave 1 of the National Youth Survey from 1976.
Here is the landing page for the NYS “series” of data:
In addition to the general description of the NYS, there are three main tabs of information. First, as you can see above, the “Studies” tab lists and links to each of the studies in the “series.” If you click on any of the specific studies or, in this case, waves of NYS data, it will take you to information for that specific wave.
Second, is the “Variables” tab. This includes a searchable list of every variable name and brief label and description.
If you click on the “more options” link below the search pane, it will give you some filters you can use, including filtering on the different waves of data (e.g., so you only get variables from Wave 1). Of course, you could also do this through that specific wave’s landing page.
Finally, is the “Data-related Publications” tab. This includes a list of every published study, including peer-reviewed journal articles, books, student theses and dissertations, and government reports, that has utilized data from one or more of the first seven waves of the NYS data series. Note, that this list is not necessarily exaustive, but it does seem fairly comprehensive.
Again, if you click on the “more options” link below the search pane, you get some options by which you can filter the data, including the year of publication, publication type (e.g., peer-reviewed journal), specific journal, author, and the specific wave of NYS data examined. Here, I have filtered for articles in Criminology, the flagship journal of the American Criminological Society and widely considered the top journal in the field of Criminology.
Now that you’ve had a chance to look over the NYS series information, let’s look at specific information for Wave 1. When you click that link, it will take you to the landing page for NYS Wave 1 that looks fairly similar to the landing page for the series.
Note that the landing page for Wave 1 includes “Variables” and “Data-related Publications” tabs similar to the landing page for the series. However, it also includes a “Data & Documentation” tab.
It is instructive to take a look at what is actually downloaded from ICPSR as this will be important when we get to actually downloading the data. First, notice that what ICPSR does is download a zip file with the title “ICPSR_08375-Codebook.pdf.zip” to your computer.
Opening the zip file reveals a folder titled “ICPSR_08375” inside.
When you open this folder, you find another folder titled “DS0001” inside. This is where you will find the codebook.
Go ahead and open the codebook and look at the information included. You’ll first notice it includes information about any data processing by ICPSR and a Table of Contents. Sometimes codebooks will contain more information on the study design and the sampling procedures specifically (see link posted on Canvas page for detailed description of NYS sampling procedures). In this case, the codebook contains information on two surveys that were conducted in Wave 1–1) Parent Interview and 2) Youth Interview. This information includes variable names and descriptions within the dataset as well as information about specific question wording as well as the original codebook and survey instruments/interview schedules.
We’ll talk more about how to use the codebook to find variables necessary to replicate or reproduce an existing study in future assignments. For right now I recommend purusing the codebook, paying particular attention to the Table of Contents for the types of information that was asked to the parents and especially the youth at Wave 1 (a lot of this information is also gathered from the youth in subsequent waves).
Throughout this walkthrough you have been exposed to the NYS data and how you can explore the topics it covers through the “variables” and “Data-related Publiations” tabs on ICPSR as well as the specific waves’ codebooks. You job now is to review these sources and come up with potential research topics for your “Replication & Reproducibility” Project.
Start by creating a new RMarkdown document with the title “CRM 495: Project Phase 1” with the option of knitting to an html file checked.
Create three headers titled “Topic #1”, “Topic #2”, "Topic #3 respectively.
Under each header create a bulletted list that provides the following information:
Upon completing the task in the previous sections, “knit” your final RMD file and save the final knitted html document to your “Assignments” folder in your LastName_CRM495_work folder as: LastName_CRM495_RR-Project-Phase1_YEAR_MO_DY.
Inside the “LastName_CRM495_commit” folder in our shared folder, create another folder named: RR_Project_Phase1.
To submit your assignment for grading, save copies of both your (1) “RR-Project-Phase1” html file and (2) your “RR-Project_Phase1” RMD file into the LastName_CRM495_commit > RR_Project_Phase1 folder. Remember, be sure to save copies of both files - do not just drag the files over from your “work” folder, or you may lose those original copies from your “work” folder.
Finally, submit your knitted html document on Canvas in the “RR Phase 1” submission portal. This will allow me to have a time-stamped version of your assignment for grading purposes.