Assignment 3 Objectives

The purpose of this third assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapters 2 and 3 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

These chapters focused on data distributions and displaying data with tabular or graphical representations. As with the two previous assignments, you will be using R Markdown (with R & RStudio) to complete and present your work. In this assignment, you will learn how to recode variables, generate frequency tables, and create simple graphs in R.

By the end of assignment #3, you should be able to…

  • create simple frequency tables using sjmisc::frq() and summarytools::freq()
  • identify strengths and limitations of frq() and freq() for creating frequency tables
  • sort a frequency table by frequency, from highest to lowest and from lowest to highest frequencies
  • recognize summarytools::dfsummary() as another way to quickly describe one or more variables in a data file
  • use theggplot()function from “ggplot2” package to generate basic bar charts and histograms
  • select specific variables using dplyr::select()
  • recode variables using mutate() and if_else() functions from the “dplyr” package
  • understand how the if_else() function works and why we use it instead of ifelse()

Assumptions & Ground Rules

We are building on objectives from Assignments 1 & 2. By the start of this assignment, you should already know how to:

Basic R/RStudio skills

  • create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
  • install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
  • recognize when a function is being called from a specific package using a double colon with the package::function() format
  • read in an SPSS data file in an R code chunk using haven::read_spss() and assign it to an R object using an assignment (<-) operator
  • use the $ symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format dataobject$varname
  • use a tidyverse %>% pipe operator to perform a sequence of actions
  • knit your RMD document into an HTML file that you can then save and submit for course credit

Reproducibility

  • use here() for a simple and reproducible self-referential file directory method
  • Use groundhog.library() as an optional but recommended reproducible alternative to library() for loading packages

Data viewing & wrangling

  • use sjPlot::view_df() to quickly browse variables in a data file
  • use attr() to identify variable and attribute value labels


If you do not recall how to do these things, first review Assignments 1 & 2.

Additionally, you should have read the assigned book chapters and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

  • units of analysis
  • variable levels of measurement
  • skewness
  • rates, percents, proportions, intervals, and interval widths
  • appropriate graphs for different types of variables (e.g., at different levels of measurement)
    • difference between histograms and bar charts and when each is appropriate
    • difference between histograms and line graphs and when each is appropriate

As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).

  • Early on, you may have a lot of trouble getting your code to run due to minor typos. This is normal.
  • Remember, you are learning to read and write a new (coding) language. As with learning any new languages, we learn from practice - and from correcting our mistakes.

Part 1 (Assignment 3.1)

Goal: Create a new RMD file for Assignment 3

(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., 2022_09_01_Fordham_K300_Assign3)

In the second assignment, you learned how to read in and assign a dataset to an R object. You also learned how to use the view_df function from the sjPlot package and the base R attr() function to display your dataframe and identify variable attributes. In this third assignment, you will use the “sjmisc” and “summarytools” packages to display your descriptive data in frequency tables. You will also learn about the dfsummary() function from the “summarytools” package, which is an alternative to sjPlot::view_df for creating a useful summary of all or a subset of the variables in a dataset. Additionally, you will learn how to select and recode variables using the select(), mutate(), and if_else functions from the “dplyr” package, and how to display your data in basic bar charts or histograms using the ggplot() function from the “ggplot2” package.

  1. Go to your K300_L folder, which should contain the R Markdown file you created for Assignment 2 (named YEAR_MO_DY_LastName_K300Assign2). Click to open the R Markdown file.
    • Remember, we open RStudio in this way so the here package will automatically set our K300_L folder as the top-level directory.

  2. In RStudio, open a new R Markdown document. If you do not recall how to do this, refer to Assignment 1.

  3. The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
    1. In the Title box, enter K300 Assignment 3.
    2. In the Author box, enter your First and Last Name (e.g., Tyeisha Fordham).
    3. Under Default Output Format box, be sure “HTML” is selected (HTML is usually the default selection)

  4. Remember that the new R Markdown file contains a simple pre-populated template to show users how to do basic tasks like add settings, create text headings and text, insert R code chunks, and create plots. Be sure to delete all the text after the YAML header before you begin working.

  5. This assignment must be completed by the student and the student alone. To confirm that this is your work, please begin all assignments with this text:
    • This R Markdown document contains my work for Assignment 3. It is my work and only my work.

Part 2 (Assignment 3.2)

Goal: Read in 2012 States Data and view variable information

  1. Create a second-level heading titled: “Part 1 (Assignment 3.1): Reading in and viewing 2012 States Data”
    1. Remember, a second-level heading starts with two hashtags followed by a space and the heading title, like this: ## Heading Title
    2. A third-level heading starts with three hashtags: ### Heading Title
    3. A fourth-level heading starts with four hashtags: #### Heading Title

  2. Now, you need to get data into RStudio. You already know how to do this, but please refer to Assignment 1 if you have questions.
    1. Create a third-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
    2. Insert an R code chunk
    3. Inside the new R code chunk, load the following six packages: tidyverse, haven, here, sjmisc, sjPlot, and summarytools.
    4. Some of these packages will need to be installed. Remember, you only need to install a package once, but you must load a package each time you start a new R session and need to use the package.

  3. After your first code chunk, create another third-level header in RMD titled: “Read Data into R”
    1. Insert another R code chunk.
    2. In the new R code chunk, read and assign the “2012 states data.sav” SPSS datafile into an R data object named StatesData2012.
      • Forget how to do this? Refer to instructions in Assignment 1.
    3. In the same code chunk, on a new line below your read data/assign object command, type the name of your new R data object: StatesData2012.
      • This will call the object and provide a brief view of the data. (Note: You can also simply click on the data object in the “Environment” window.)
      • Your R studio session should now look a lot like this: