Assumptions & Ground Rules

The purpose of this second assignment is to help you begin to explore your data in R and to do so within an RMarkdown document. The specific activities were inspired by the SPSS Exercises from the end of Chapter 1 in Bachman and Paternoster’s Statistics for Criminology & Criminal Justice, 4th Ed.

First, you will learn to create an R Markdown file in which to save and present your work for this class.

As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).

  • Early on, you may have a lot of trouble getting your code to run due to minor typos. This is normal.
  • Remember, you are learning to read and write a new (coding) language. As with learning any new languages, we learn from practice - and from correcting our mistakes.

Part 1 (Assignment 2.1)

Goal: Create new R Markdown file in which to complete your Assignment 2.

(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., Brauer_P680_Assign2_2021_08_30)

In the first assignment, you learned about writing and running R code and making comments in an R Script file. You also saw how running certain commands (e.g., read_spss) from an R Script file will generate results in the RStudio Console and learned how to save the results of such commands into an R object. For Assignment 2, you will learn to create a new R Markdown file, and you will complete the remainder of your assignment in that file.

  • Like an R Script file, an R Markdown file can be used to write and run R code. However, an R Markdown file also can do much more than that. For instance, you can write and edit text, write and run R code, and generate statistical results and plots directly in the RMarkdown file. You can even create entire books and webpages using R Markdown. In fact, both this assignment and the first one were created using R Markdown.
  • R Markdown is an essential tool for producing reproducible research because, with it, we can thoroughly document and simultaneously provide detailed explanations for all of our coding decisions in a project - from opening and manipulating data, to recoding and combining variables, to summarizing and analyzing data, to creating and modifying figures.
  • We will start by simply opening and saving a new R Markdown file. For more detailed instructions, check out Danielle Navarro’s video on creating a new R Markdown file.
  1. Go to your P680_work folder, which should contain the R Script file you created for Assignment 1 (named LastName_P680_Assign1_RScript_YEAR_MO_DY). Click to open the R Script file.
    • Remember, we open RStudio in this way so the here package will automatically set our P680_work folder as the top-level directory.
  2. In RStudio, go to File > New File > R Markdown to open a new R Markdown document.
    Opening a new R Markdown file

    Opening a new R Markdown file

  3. The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
    1. In the Title box, enter P680 Assignment 2.
    2. In the Author box, enter your First and Last Name (e.g., Jon Brauer).
    3. Under Default Output Format box, select “Word document” (HTML is usually the default selection)
      • (Note: You must have Microsoft Word installed for this to work properly. IU students can install Word for free.)
  4. Click OK to create your new R Markdown file. It should look like this:
    Your new R Markdown file

    Your new R Markdown file

  5. The new R Markdown file contains a simple pre-populated template to show users how to do basic tasks like add settings, create text headings and text, insert R code chunks, and create plots. Feel free to read through the template - you may find it helpful. Personally, I find the template a little distracting and a bit overwhelming for new users. So, we are going to delete everything after the metadata and second set of three dashes (i.e., after the YAML header).
    Keep the YAML header; delete the template

    Keep the YAML header; delete the template

  6. Familiarize yourself with R Markdown by adding some headers, text, and R code chunk.
    1. Hit <Enter> to leave a blank line between the header and the first line of text.
    2. On the next line (line 8), type: ## Part 1 (Assignment 2.1)
      • In the markdown document, two hashmarks specifies a second-level text heading.
      • Note: This is different from the R Script file, which only contains R code. Recall, a hashmark transforms code into a comment that is not evaluated or run in an R Script file (and in an R code chunk in R Markdown, which you will learn about soon).
    3. Hit<Enter>and, on the next line (line 9), type: ### Learning R Markdown
      • Three hashmarks specifies a third-level text heading.
    4. Hit<Enter>to leave another blank line (line 10)
    5. On the next line (line 11), type the following sentence: This R Markdown document contains my work for Assignment 2. It is my work and only my work.
      • To italicize, place a single asterisk (*) before and after the word or text segment.
      • To bold, place two asterisks (**) before and after the word or text segment.
      • To bold and italicize, place three asterisks (***) before and after the word or text segment.
      • There are a lot of sources online that explain various formatting options in R Markdown. For examples, check out here, here, here, here, and here. Also, check out Nicholas Tierney’s bookdown for descriptions of and solutions to common problems with RMarkdown.
  7. Before typing anything else, save your new R Markdown file in your “LastName_P680_work” folder. Name the file: LastName_P680_Assign2_RMD_YEAR_MO_DY
  8. Your RStudio session should now look similar to this:
    Your first R Markdown file

    Your first R Markdown file

  9. Ready for one of the best parts of R Markdown? No more pasting screenshots into Word! Instead, you are going to use the “knit” button at the top of your R Markdown file to automatically create a Word document capturing your current work.
    1. Click “knit” button
    2. A Word document should pop up that looks a lot like this:
      Your first knitted html document using R Markdown

      Your first knitted html document using R Markdown

Part 2 (Assignment 2.2)

Goal: Read data and save as object.

First, you need to get data into RStudio. You already know how to do that in an R Script. It is the same process in R Markdown, except you need to add an “R code chunk” into your file.

  1. Create a second-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
  2. Insert an R chunk
    1. Click “Code > Insert chunk” or click the “insert code chunk” button (green box with a “C” in it) and select “R” opton (see below).
      Insert R code chunk

      Insert R code chunk

  3. Inside the new R code chunk, load the following three packages: tidyverse, haven, and here
    • Recall, you only need to install packages one time. However, you must load them each time you start a new R session.
  4. After your first code chunk, create another second-level header in RMD titled: “Read Data into R”
  5. Insert another R code chunk
  6. In the new R code chunk, read and save the “2013YRBS.sav” SPSS datafile into an R data object named YRBS2013data
    • Forget how to do this? Refer to instructions in Assignment 1.
    • Before doing this, be sure you have visited the Week 3 Module, below the “R Assignment 2: Introduction to R Markdown” entry, on the class Canvas site to download the SPSS data file “2013YRBS.sav” and have saved it in your “LastName_P680_work > Datasets” folder. As with the Assignment 1 dataset, this file was downloaded from the companion website for Bachman & Paternoster’s Statistics for Criminology & Criminal Justice, 4th Ed.
  7. In the same code chunk, on a new line below your read data/save object command, type the name of your new R data object: YRBS2013data
    • This will call the object and provide a brief view of the data. (Note: You can get a similar but more visually appealing view by simply clicking on the object in the “Environment” window. More on this later.)
    • Your R studio session should now look a lot like this:
      Reading data in R

      Reading data in R

    • As in the image, you should see 13,583 rows and 114 columns, which corresponds to 13,583 individual observations and 114 variables (e.g., age; sex; grade).

Part 3 (Assignment 2.3)

Goal: Use sjPlot::view_df() and attr() functions to view variables and their attributes.

The following activities parallel some of the “SPSS Exercises” found at the end of B&P’s Chapter 1, where those authors walk readers through the SPSS program’s “Variable View” feature (e.g., see “Navigating SPSS” in B&P’s Ch. 1, Question 2, on page 21).

If you have used SPSS before, then you might be familiar with its “Variable View” feature, and there may be times when you find yourself missing this feature while working in RStudio. While RStudio does not have a built-in “Variable View” function, we can generate something similar using the view_df() function from the sjPlot package. (For additional instructions on view_df() and other methods for describing variables in R, see Martin Chan’s blog on viewing SPSS labels in R.)

  1. Add a new second-level RMD heading called “Variable View using sjPlot”
  2. Install sjPlot package
    • Recall, you can do this in an R code chunk. However, if you do, remember to COMMENT OUT the install.packages("sjPlot") line afterwards. You do not want to keep installing packages every time you run your R code. Alternatively, some people recommend typing install.packages() commands directly into the RStudio Console (bottom left of RStudio) or using the install option under the “Packages” tab (bottom right of RStudio).
  3. Insert a new R code chunk and load the sjPlot package library
  4. The code provided for using the view_df() function will also introduce you to a “pipe” - %>% - an immensely useful coding element from the tidyverse package that efficiently links together sequenced actions. In this case, we will call the data object (YRBS2013data) and then use a pipe to connect it in sequence to the view_df() function from sjPlot.
    1. Type the following into your new code chunk: YRBS2013data %>% view_df()
    2. You should now see something like this in the “Viewer” tab (bottom right in RStudio):
      Variable View using sjPlot

      Variable View using sjPlot

In B&P’s SPSS Exercise 2.ii.1 at the end of Chapter 1, the authors ask readers to use Variable View to describe the variable name, variable label, and value labels for following four variables:

a. Row 2
b. Row 4
c. Row 23
d. Row 45

Using view_df, you can find the variable name for a given row by referring to the Name column (see “Viewer” tab). For instance, the variable name for Row 5 is “race7.” You can find the variable label by referring to the Label column. For example, the variable label for “race7” is 7-level race variable. Finally, you find value labels (e.g., survey response options) by referring to the Value Labels column. For instance, the value labels for “race” are 1=Am Indian/Alaska Native, 2=Asian, 3=Black or African American, etc., through 7=Multiple - Non-Hispanic.

While the sjPlot::data %>% view_df function shows all variables in a dataframe (similar to the “Variable View” in SPSS), the attr() function can be used to describe the attributes of a specific variable.

  1. Add a new second-level RMD heading called “Variable Attributes using attr”.
  2. Insert a new R code chunk.
  3. The code provided for using the attr() function once again will use a pipe to link our data to a specific action. Additionally, it will introduce you to another important coding symbol - the $ - used to call a specific element (e.g., row, column, or variable) within an object (e.g., dataframe or tibble). In this case, we will call a specific variable (race7) from our data object (YRBS2013data) like this: YRBS2013data$race7. We then use a pipe to connect it in sequence to the attr() function.
    • Note: The singular attr('label') typically requests a variable label, whereas the plural attr('labels') typically request the value labels.
    1. To view the variable label that describes the content of the race7 variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('label').
    2. To view the value labels that describe the survey response options for the race7 variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('labels')
    3. Run this code chunk. Your RMD file should output the following:
      Variable Attributes using attr

      Variable Attributes using attr

  4. Finally, B&P’s Chapter 1 SPSS Exercises also ask about specific response values for particular variables in the data, and they point readers to the “Data View” to answer these. You can view your data in various ways in RStudio - recall a simple and functionally convenient way is to simply click on the data object in the “Environment” window after reading and saving the data as an object, which will open the data as a read-only table in a new tab.
    1. Remember to refer to a variable’s value labels to determine what a specific numeric value in the data means for that variable.
    2. Missing values are indicated with a “.” in SPSS but with an NA in R.

You should now have everything that you need to view and describe the variables, variable labels, value labels, and response values in a dataset!

  1. In your RMarkdown file, use attr(label) and attr(labels) functions in code chunks and describe in your markdown text the variable name and variable label (if the variable has both a name and a label) as well as the numeric response values and associated response value labels for the variables in the following four rows: Row 2; Row 4; Row 23; Row 45.
    1. Keep the file clean and easy to follow by using RMD level headings (e.g., denoted with ## or ###) separating R code chunks, organized by assignment questions.
    • Toward this aim, try using R code chunk options to clean up your document. For example, while the viewdf() function can be quite useful while working with data in an RMD document, including this code chunk in your knitted document will result in several pages containing a print-out of all the variables in your dataset. We usually do not want this type of stuff in our final clean knitted document. In the first line of that R code chunk, try placing a comma right after the {r and then typing eval=FALSE before the closing } bracket. The option eval = FALSE means that the code chunk should not be evaluated (i.e., “run”) when knitting the document. There are various other useful code chunk options, such as include = FALSE, echo = FALSE, or results = FALSE. For more information, check out here, here and here.
    • You might also want to change the default html theme to something more aesthetically pleasing. For examples, see here, here, and here.
    1. Write plain text after headings and before or after code chunks to explain what you are doing - such text will serve as useful reminders to you when working on later assignments!
    2. Upon completing the assignment, “knit” your final RMD file again and save the final knitted Word document to your “Assignments” folder in your LastName_P680_work folder as: LastName_P680_Assign2_YEAR_MO_DY.
    3. Inside the “LastName_P680_commit” folder in our shared folder, create another folder named: Assignment 2.
    4. To submit your assignment for grading, save copies of both your (1) “Assign2” Word file and (2) your “Assign2_RMD file” into the LastName_P680_commit > Assignment 2 folder. Remember, be sure to save copies of both files - do not just drag the files over from your “work” folder, or you may lose those original copies from your “work” folder.