Assumptions & Ground Rules

The primary purpose of this first project assignment is to familiarize yourself with the National Youth Survey (NYS) and begin to develop some ideas for research topics or questions that you will eventually examine with a replication study.

  • Since I’m requiring you to use the NYS data for your replication and reproducibility project, I’m trying to get you to identify things you are interested in that can potentially be examined using the NYS data. Then, once you know what you could potentially do with the NYS, in a subsequent assignment, I will ask you to identify specific research on that topic, including published studies that have used the NYS.

At this point, all I assume is that you are familiar with RStudio and with creating R Markdown (RMD) files. If not, please review R Assignments 1 & 2. Prior experience with reviewing a codebook and/or variable information on ICPSR will be beneficial, but not necessary as I will try to walk you through those tasks in this assignment.

Part 1: Introducing the National Youth Survey

Goal: Provide a brief introduction to the NYS

The National Youth Survey, overseen by Delbert Elliott, was one of the first national-level longitudinal studies specifically designed to measure self-reported crime and has been a popular data source for many high profile criminological studies. Currently, ICPSR lists 412 total publications. Originally, it was designed to study youth between the ages of 11 and 17 and followed them for a total of five years from 1976 to 1980. Subsequent waves have also been collected at varying intervals (e.g., 1983, 1986, and 1992), and it is now called the National Youth Survey - Family Study (also see here).

ICPSR has the first seven waves of data publicly available and any one or a series of these waves will serve as the data for your replication and reproducibility project for this class.

Here is the general description of the series from ICPSR:

For this series, parents and youth were interviewed about events and behavior of the preceding year to gain a better understanding of both conventional and deviant types of behavior by youths. Data were collected on demographic and socioeconomic status of respondents, disruptive events in the home, neighborhood problems, parental aspirations for youth, labeling, integration of family and peer contexts, attitudes toward deviance in adults and juveniles, parental discipline, community involvement, drug and alcohol use, victimization, pregnancy, depression, use of outpatient services, spouse violence by respondent and partner, and sexual activity. Demographic variables include sex, ethnicity, birth date, age, marital status, and employment of the youths, and information on the marital status and employment of the parents.

In what follows, I will try to walk you through the ICPSR website from where you will ultimately download the data for your project. For the moment, you will only need to access the basic description, variable list, and codebook for the data.

Part 2: ICPSR and the NYS Data Series

Goal: Familiarize yourself with ICPSR and NYS series landing page

It’s worth familiarizing yourself with the ICPSR webpage for a specific dataset or a series of data sets as it has some pretty standard things that can be useful. Here, we’ll start with the NYS Series page and then review the specifics for Wave 1 of the National Youth Survey from 1976.

Here is the landing page for the NYS “series” of data:

ICPSR landing page for National Youth Survey Series

ICPSR landing page for National Youth Survey Series

In addition to the general description of the NYS, there are three main tabs of information. First, as you can see above, the “Studies” tab lists and links to each of the studies in the “series.” If you click on any of the specific studies or, in this case, waves of NYS data, it will take you to information for that specific wave.

Second, is the “Variables” tab. This includes a searchable list of every variable name and brief label and description.

ICPSR variable page for National Youth Survey Series

ICPSR variable page for National Youth Survey Series

If you click on the “more options” link below the search pane, it will give you some filters you can use, including filtering on the different waves of data (e.g., so you only get variables from Wave 1). Of course, you could also do this through that specific wave’s landing page.

Finally, is the “Data-related Publications” tab. This includes a list of every published study, including peer-reviewed journal articles, books, student theses and dissertations, and government reports, that has utilized data from one or more of the first seven waves of the NYS data series. Note, that this list is not necessarily exaustive, but it does seem fairly comprehensive.

Finding Data-Related Publications on ICPSR

Finding Data-Related Publications on ICPSR

Again, if you click on the “more options” link below the search pane, you get some options by which you can filter the data, including the year of publication, publication type (e.g., peer-reviewed journal), specific journal, author, and the specific wave of NYS data examined. Here, I have filtered for articles in Criminology, the flagship journal of the American Criminological Society and widely considered the top journal in the field of Criminology.

Look, Jon's article from 2009 is the last article in Criminology to use the NYS

Look, Jon’s article from 2009 is the last article in Criminology to use the NYS

Part 3: ICPSR and specific waves of the NYS

Goal: Familiarize yourself with Wave 1 of the NYS on ICPSR

Now that you’ve had a chance to look over the NYS series information, let’s look at specific information for Wave 1. When you click that link, it will take you to the landing page for NYS Wave 1 that looks fairly similar to the landing page for the series.

ICPSR landing page for National Youth Survey Wave 1

ICPSR landing page for National Youth Survey Wave 1

Note that the landing page for Wave 1 includes “Variables” and “Data-related Publications” tabs similar to the landing page for the series. However, it also includes a “Data & Documentation” tab.

Data & Documentation page for National Youth Survey Wave 1

Data & Documentation page for National Youth Survey Wave 1

If you click on the download icon it will show you the various data formats for which the data are available. You’ll notice that the NYS has lots of options in terms of file formats for which you can download the data, but that R is not one of them. We’ll show you how to dealdownload one of these other file formats and import them into R in a future assignment. For right now, go ahead and download the codebook by clicking on the “Codebook [PDF]” link.
Download Codebook for National Youth Survey Wave 1

Download Codebook for National Youth Survey Wave 1

It is instructive to take a look at what is actually downloaded from ICPSR as this will be important when we get to actually downloading the data. First, notice that what ICPSR does is download a zip file with the title “ICPSR_08375-Codebook.pdf.zip” to your computer.

ICPSR_08375 folder with for NYS Wave 1

ICPSR_08375 folder with for NYS Wave 1

Opening the zip file reveals a folder titled “ICPSR_08375” inside.

Download folder with zip file for NYS Wave 1

Download folder with zip file for NYS Wave 1

When you open this folder, you find another folder titled “DS0001” inside. This is where you will find the codebook.

  • Note: When we eventually download NYS data, the data will be in the “DS0001” folder along with the codebook. Sometimes data you download from ICPSR will have multiple data sets associated with them, and each data file will be contained its own folder. So, for example, if there were a second data set associated with this particular wave of the NYS, there would likely be another folder named “DS0002.”
DS0001 folder for NYS Wave 1

DS0001 folder for NYS Wave 1

Folder with codebook for NYS Wave 1

Folder with codebook for NYS Wave 1

Go ahead and open the codebook and look at the information included. You’ll first notice it includes information about any data processing by ICPSR and a Table of Contents. Sometimes codebooks will contain more information on the study design and the sampling procedures specifically (see link posted on Canvas page for detailed description of NYS sampling procedures). In this case, the codebook contains information on two surveys that were conducted in Wave 1–1) Parent Interview and 2) Youth Interview. This information includes variable names and descriptions within the dataset as well as information about specific question wording as well as the original codebook and survey instruments/interview schedules.

We’ll talk more about how to use the codebook to find variables necessary to replicate or reproduce an existing study in future assignments. For right now I recommend purusing the codebook, paying particular attention to the Table of Contents for the types of information that was asked to the parents and especially the youth at Wave 1 (a lot of this information is also gathered from the youth in subsequent waves).

  • Note: You can also download and peruse the codebooks for subsequent waves if you are interested to see additional topics that were studied in those subsequent waves.

Part 4: Identify Potential Topics for Replication & Reproducibility Project

Goal: Identify and Describe Three Potential Research Topics

Throughout this walkthrough you have been exposed to the NYS data and how you can explore the topics it covers through the “variables” and “Data-related Publiations” tabs on ICPSR as well as the specific waves’ codebooks. You job now is to review these sources and come up with potential research topics for your “Replication & Reproducibility” Project.

  • Note: If it were me, I would go the NYS Series landing page on ICPSR and review the “Variables” included across the different waves as well as the prior publications that have analyzed the data. This should give me a good idea of the types of topics that have been analyzed with the ICPSR. I may also peruse the codebook for Wave 1 for some of this information as well.
  1. Start by creating a new RMarkdown document with the title “CRM 495: Project Phase 1” with the option of knitting to an html file checked.

  2. Create three headers titled “Topic #1”, “Topic #2”, "Topic #3 respectively.

  3. Under each header create a bulletted list that provides the following information:

    1. State a potential research topic in a sentence or two.
    2. Explain why you are interested in this topic.
    3. What key variables/survey questions are necessary to study this topic?
    4. Identify any challenges you perceive in studying this topic with the NYS.
  • Note: At this point, we are at the brainstorming stage. I simply want you to think about what you could potentially study using the NYS data. You do not have to commit to any of these topics now, I just want you to start developing ideas. You may also have a research topic that you are particularly interested in from work you have done in a previous class. Feel free to list that for this assignment if you want to explore the feasability of studying it with the NYS.

Part 5: Submit your assignment

  1. Upon completing the task in the previous sections, “knit” your final RMD file and save the final knitted html document to your “Assignments” folder in your LastName_CRM495_work folder as: LastName_CRM495_RR-Project-Phase1_YEAR_MO_DY.

  2. Inside the “LastName_CRM495_commit” folder in our shared folder, create another folder named: RR_Project_Phase1.

  3. To submit your assignment for grading, save copies of both your (1) “RR-Project-Phase1” html file and (2) your “RR-Project_Phase1” RMD file into the LastName_CRM495_commit > RR_Project_Phase1 folder. Remember, be sure to save copies of both files - do not just drag the files over from your “work” folder, or you may lose those original copies from your “work” folder.

  4. Finally, submit your knitted html document on Canvas in the “RR Phase 1” submission portal. This will allow me to have a time-stamped version of your assignment for grading purposes.