Assignment 2 Objectives

The purpose of this second assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 1 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

Following Assignment 1, you will create an R Markdown file in which to save and present your work for this class. Additionally, for this assignment, you will use R/RStudio to view variables in a data file and find information about variables, including variable names, labels, and values. This assignment should help you navigate R/RStudio and become comfortable finding basic information within data files.

By the end of assignment #2, you should be able to…

  • use view_df() function from the “sjPlot” package to quickly browse variables in a data file
  • use a tidyverse “pipe” (%>%) coding operator (from the “magrittr” package) to link together sequenced actions, such as call a data object then apply a function to a variable in that data object
  • use attr() function from base R to identify variable and attribute value labels
  • understand how the $ symbol can be used to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble)
  • recognize when missing values are coded as NA for variables in your data file
  • recognize when a function is being called from a specific package using a double colon (i.e., with the package::function() format)
  • knit your RMD file into an HTML document and then save and submit it for course credit

Assumptions & Ground Rules

We are building on Assignment 1 objectives. By the start of this assignment, you should already know how to:

Basic R/RStudio skills

  • create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
  • install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run

Reproducibility

  • use here() for a simple and reproducible self-referential file directory method
  • Use groundhog.library() as an optional but recommended reproducible alternative to library() for loading packages


If you do not recall how to do these things, first review Assignment 1.

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

  • difference between descriptive and inferential statistics
  • validity and reliability
  • difference between a population and a sample
  • sampling techniques

As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).

  • Early on, you may have a lot of trouble getting your code to run due to minor typos. This is normal.
  • Remember, you are learning to read and write a new (coding) language. As with learning any new languages, we learn from practice - and from correcting our mistakes.

Part 1 (Assignment 2.1)

Goal: Create new R Markdown file in which to complete your Assignment 2.

(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., 2022_05_20_Fordham_K300Assign2_RMD)

In the first assignment, you learned how to create a new R Markdown file and use it to write and run R code, and make comments. You also saw how running certain commands (e.g., read_spss) from an R Markdown file will generate results in the RStudio Console and learned how to assign the results of such commands into an R object. In Assignment 2, you will learn how to read in and assign datasets as an R object. You will also learn how to use the sjPlot package to quickly view variables with its view_df() function and to use the base R attr() function to identify variable labels and variable attribute value labels.

  1. Go to your K300_L folder, which should contain the R Markdown file you created for Assignment 1 (named YEAR_MO_DY_LastName_K300Assign1_3RMD). Click to open the R Markdown file.
    • Remember, we open RStudio like this because it is an easy way to ensure the here package automatically sets our K300_L folder as the top-level working directory.

  2. In RStudio, open a new R Markdown document. If you do not recall how to do this, refer to Assignment 1.

  3. The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
    1. In the Title box, enter K300 Assignment 2.
    2. In the Author box, enter your First and Last Name (e.g., Jon Brauer & Tyeisha Fordham).
    3. Under Default Output Format box, select “HTML” (HTML is usually the default selection)
      • (Note: In the first assignment, you knitted to a Word document. For this and remaining assignments, you will use the default setting to knit to HTML documents instead)

  4. Remember that the new R Markdown file contains a simple pre-populated template to show users how to do basic tasks like add settings, create text headings and text, insert R code chunks, and create plots. Be sure to delete this text before you begin working.

Part 2 (Assignment 2.2)

Goal: Read data and assign to R object.

  1. Create a second-level header titled: “Part 1 (Assignment 2.1).” Then, create a third-level header titled: “Learning R Markdown.”

  2. This assignment must be completed by the student and the student alone. To confirm that this is your work, please begin all assignments with this text: This R Markdown document contains my work for Assignment 2. It is my work and only my work.

  3. Now, you need to get data into RStudio. You already know how to do this, but please refer to Assignment 1 if you have questions.
    1. Create a second-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
    2. Insert an R chunk
    3. Inside the new R code chunk, load the following three packages: tidyverse, haven, and here.
      • Recall, you only need to install packages one time. However, you must load them each time you start a new R session.
      • Also, if you did not optionally install the “tidyverse” package for Assignment 1 (because it includes the “haven” package), then you will need to install that package before loading it. you can do this by typing install.packages("tidyverse") in the R console. Alternatively, you can type that into an R chunk - just remember to comment out the command after running it (by adding a “#” in front of it).

  4. After your first code chunk, create another second-level header in RMD titled: “Read Data into R”
    1. Insert another R code chunk
    2. In the new R code chunk, read and assign the “2013YRBS.sav” SPSS datafile into an R data object named YRBS2013data.
      • Forget how to do this? Refer to instructions in Assignment 1.
    3. In the same code chunk, on a new line below your read data/assign object command, type the name of your new R data object: YRBS2013data. This will call the object and provide a brief view of the data. (Note: You can get a similar but more visually appealing view by simply clicking on the object in the “Environment” window. More on this later.) Your R studio session should now look a lot like this:
Reading data in R

Reading data in R

  1. As in the image, you should see 13,583 rows and 114 columns, which corresponds to 13,583 individual observations and 114 variables (e.g., age; sex; grade).

  2. Click “knit” to make sure it generates an HTML document without errors at this point. It is a good idea to do this periodically, as it makes it easier to identify and correct errors when they occur.

Part 3 (Assignment 2.3)

Goal: Use sjPlot::view_df() and attr() functions to complete “SPSS Exercises” at the end of B&P’s Ch.1 (pp.20-21).

This week’s assignment will ask questions that parallel those found in the SPSS exercises at the end of B&P’s Chapter 1. In this section, you will learn about a couple functions that will help you answer these questions.

First, refer to B&P’s Chapter 1, Question 2, on page 21 (“Navigating SPSS), which refers to a”Variable View in SPSS.” While RStudio does not have a built-in “Variable View” like the one found in SPSS, we can generate something similar using the view_df() function from the sjPlot package. Additionally, with this function, you should be able to answer these questions. (For additional instructions on view_df() and other methods for describing variables in R, see Martin Chan’s blog on viewing SPSS labels in R.)

  1. Add a new second-level RMD heading called “Variable View using sjPlot”

  2. Install sjPlot package
    • Recall, you can do this in an R code chunk. However, if you do, remember to COMMENT OUT the install.packages("sjPlot") line afterwards. You do not want to keep installing packages every time you run your R code. Alternatively, some people recommend typing install.packages() commands directly into the RStudio Console (bottom left of RStudio) or using the install option under the “Packages” tab (bottom right of RStudio).

  3. Insert a new R code chunk and load the sjPlot package library

  4. The code provided for using the view_df() function will also introduce you to a “pipe” - %>% - an immensely useful coding element from the tidyverse package that efficiently links together sequenced actions. In this case, we will call the data object (YRBS2013data) and then use a pipe to connect it in sequence to the view_df() function from sjPlot.
    1. Type the following into your new code chunk: YRBS2013data %>% view_df()
    2. You should now see something like this in the “Viewer” tab (bottom right in RStudio):
Variable View using sjPlot

Variable View using sjPlot

Now, refer back to B&P’s SPSS Exercise 2.ii.1 at the end of Chapter 1, which asks about the following four variables: a. Row 2 b. Row 4 c. Row 23 d. Row 45

Using view_df, you can answer questions about the variable name for a given row by referring to the Name column (see “Viewer” tab). For instance, the variable name for Row 5 is “race7.” You can also answer questions about the variable label by referring to the Label column. For example, the variable label for “race7” is 7-level race variable. Finally, you can answer questions about value labels (e.g., survey response options) by referring to the Value Labels column. For instance, the value labels for “race” are 1=Am Indian/Alaska Native, 2=Asian, 3=Black or African American, etc., through 7=Multiple - Non-Hispanic.

While the sjPlot::data %>% view_df() function shows all variables in a dataframe (similar to the “Variable View” in SPSS), the attr() function can be used to describe the attributes of a specific variable.

  • Note: As above, you will sometimes see packages and functions written together, separated by a double colon (::). Since R is open-source technology, it is common to have user-written packages that rely on the same commands to call their functions. For instance, in the next assignment, you will learn about the select() function from the “dplyr” package. The term “select” is quite common, so the select() command may have conflicts across packages. One way to ensure that you are calling the function from the package that you want is by specifically calling the package first, followed by a double colon and then the function, using the following format: package::function().
  1. Add a new second-level RMD heading called “Variable Attributes using attr()”

  2. Insert a new R code chunk.

  3. The code provided for using the attr() function once again will use a pipe to link our data to a specific action. Additionally, it will introduce you to another important coding symbol - the $ - used to call a specific element (e.g., row, column, or variable) within an object (e.g., dataframe or tibble). In this case, we will call a specific variable (race7) from our data object (YRBS2013data) like this: YRBS2013data$race7. We then use a pipe to connect it in sequence to the attr() function.
    • Note: The singular attr('label') typically requests a variable label, whereas the plural attr('labels') typically request the value labels.
    1. To view the variable label that describes the content of the race7 variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('label').
    2. To view the value labels that describe the survey response options for the race7 variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('labels')
    3. Run this code chunk. Your RMD file should output the following:
Variable Attributes using attr

Variable Attributes using attr

  1. Finally, some questions will ask about specific response values in the data. B&P’s exercises point you to the “Data View” to answer these. You can view your data in various ways in RStudio - recall a simple and functionally convenient way is to simply click on the data object in the “Environment” window after reading and assigning the data to an object, which will open the data as a read-only table in a new tab.
    1. Remember to refer to a variable’s value labels to determine what a specific numeric value in the data means for that variable.
    2. Missing values are indicated with a “.” in SPSS but with an NA in R.

You should now have everything that you need to complete the questions in Assignment 2 that parallel those from B&P’s SPSS Exercises for Chapter 1!

  1. Complete the remainder of the questions in Assignment 2 in your RMD file.
    1. Keep the file clean and easy to follow by using RMD level headings (e.g., denoted with ## or ###) separating R code chunks, organized by assignment questions.
    2. Write plain text after headings and before or after code chunks to explain what you are doing - such text will serve as useful reminders to you when working on later assignments!
    3. Upon completing the assignment, “knit” your final RMD file again and save the final knitted HTML document to your “Assignments” folder as: YEAR_MO_DY_LastName_K300Assign2. Submit via Canvas in the relevant section (i.e., the last question) for Assignment 2.

Assignment 2 Objective Checks

After completing assignment #2, can you…

  • use view_df() from the “sjPlot” package to view variable information in a data file?
  • use a tidyverse “pipe” (%>%) coding operator (from the “magrittr” package) to call a data object then apply a function (e.g., view_df) to a variable in that data object?
  • use the attr() function to view variable and attribute value labels?
  • understand how the $ symbol can be used to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble)?
  • recognize when missing values are coded as NA for variables in your data file?
  • recognize when a function is being called from a specific package using a double colon (i.e., with the package::function() format)?
  • knit your RMD file into an HTML document and then save and submit it for course credit?