Assignment 5 Objectives

The purpose of this fourth assignment is to help you use R to complete some of the Assignment 5 exercises adapted from the SPSS Exercises at the end of Chapter 4 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

This chapter focused on measures of central tendency (e.g., mean, median, and mode,) and their advantages and disadvantages as single statistical descriptions of a data distribution. Likewise, in this assignment, you will learn how to use R to calculate measures of central tendency and other statistics (e.g., skewness; kurtosis) that us help standardize and efficiently describe the shape of a data distribution. You will also get additional practice with creating frequency tables and simple graphs in R. As with previous assignments, you will be using R Markdown (with R & R Studio) to complete and submit your work.

By the end of Assignment 5, you should…

  • know how to use the base R head() function to quickly view a snapshot of your data
  • know how to use the glimpse() function to quickly view all columns (variables) in your data
  • be able to improve some knitted tables by piping a function’s results to gt() (e.g., head(data) %>% gt())
  • be able to calculate measures of central tendency for a variable distribution using base R functions mean() and median().
  • know how calculate central tendency and other basic descriptive statistics for specific variables in a dataset using summarytools::descr() function
  • have more practice with using the $ operator to reference or call a named element from a list or data object, such as a specific variable in a data file (e.g., mean(data$variable))?
  • have more practice with creating frequency tables using sjmisc::frq() and summarytools::freq()
  • have more practice with generating simple histograms using ggplot()

Assumptions & Ground Rules

We are building on objectives from Assignments 1-3. By the start of this assignment, you should already know how to:

Basic R/RStudio skills

  • create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
  • install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
  • recognize when a function is being called from a specific package using a double colon with the package::function() format
  • read in an SPSS data file in an R code chunk using haven::read_spss() and assign it to an R object using an assignment (<-) operator
  • use the $ symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format dataobject$varname
  • use a tidyverse %>% pipe operator to perform a sequence of actions
  • knit your RMD document into an Word file that you can then save and submit for course credit

Reproducibility

  • use here() for a simple and reproducible self-referential file directory method

Data viewing & wrangling

  • use sjPlot::view_df() to quickly browse variables in a data file
  • use attr() to identify variable and attribute value labels
  • recognize when missing values are coded as NA for variables in your data file
  • select and recode variables using dplyr’s select(), mutate(), and if_else() functions

Descriptive data analysis

  • use summarytools::dfsummary() to quickly describe one or more variables in a data file
  • create frequency tables with sjmisc:frq() and summarytools::freq() functions
  • sort frequency distributions (lowest to highest/highest to lowest) with summarytools::freq()

Data visualization & aesthetics

  • create basic graphs using ggplot2’s ggplot() function

If you do not recall how to do these things, first review Assignments 1-4.

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

  • measures of central tendency: mean, median, and mode
  • how to recognize a unimodal or a bimodal distribution and to identify the mode of a distribution
  • how to calculate the median position from raw data or from grouped data
  • how to calculate the arithmetic mean from raw data, grouped data, or a frequency distribution
  • how skewness affects measures of central tendency
  • comparative advantages and disadvantages of the mean and the median

Part 1 (Assignment 5.1)

Goal: Create a new RMD file for Assignment 5

(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR-MO-DY for the actual date. E.g., 2023-01-23_Ducate_CRIM5305_Assign05)

In the last assignment, you learned how to use sjmisc::frq() and summarytools::freq() functions to generate frequency tables for variables. You also learned about the summarytools::dfsummary() function for quickly summarizing all or a subset of the variables in a data object. Lastly, you learned how to select and recode variables using dplyr’s mutate() and if_else() functions as well as how to display data in graphs using ggplot(). In this assignment, you will decide which measure of central tendency is most appropriate for a given variable, then use frequency tables and R functions to calculate measures of central tendency and other univariate descriptive statistics.

  1. Go to your CRIM5305_L folder, which should contain the R Markdown file you created for Assignment 4 (named YEAR-MO-DY_LastName_CRIM5305_Assign04). Click to open the R Markdown file.
    • Remember, we open RStudio in this way so the here package will automatically set our CRIM5305_L folder as the top-level directory.

  2. In RStudio, open a new R Markdown document. If you do not recall how to do this, refer to Assignment 1.

  3. The dialogue box asks for a Title, an Author, and a Default Output Format for your new R Markdown file.
    1. In the Title box, enter CRIM5305 Assignment 5.
    2. In the Author box, enter your First and Last Name (e.g., Caitlin Ducate).
    3. Under Default Output Format box, ensure “Word” is selected
  4. Remember that the new R Markdown file contains a simple pre-populated template to show users how to do basic tasks like add settings, create text headings and text, insert R code chunks, and create plots. Be sure to delete this text before you begin working.

  5. Create a second-level heading titled: “Part 1 (Assignment 4.1)” a. Remember, a second-level heading starts with two hashtags followed by a space and the heading title, like this: ## Heading Title
  6. This assignment must be completed by the student and the student alone. To confirm that this is your work, please begin all assignments with this text:
    • This R Markdown document contains my work for Assignment 5. It is my work and only my work.

Part 2 (Assignment 5.2)

Goal: Reading in Data and Creating Frequency Table from Youth Data

We will begin by reading in the Youth dataset and creating a frequency table of the “parnt2” variable. The frequency table will allow us to answer the first question on pages 100-101.

  1. Create a second-level header titled: “Part 2 (Assignment 5.2).”

  2. Create a third-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
    1. Insert an R chunk.
    2. Inside the new R code chunk, load the following packages: tidyverse, haven, here, sjmisc, sjPlot, summarytools, and gt.
      • Note: A new package - “gt” is listed above. Before loading it, remember that you must first install the package. Also, recall that you only need to install packages one time, but you must load them each time you start a new R session.

  3. After your first code chunk, create another third-level header in RMD titled: “Read Data into R”
    1. Insert another R code chunk.

    2. In the new R code chunk, read and assign the “Youth_0.sav” SPSS datafile into an R data object named YouthData.

      • Forget how to do this? Refer to instructions in Assignment 1.
    3. In the same code chunk, on a new line below your read data/assign object command, type the name of your new R data object: YouthData.

      • Remember, this will call the object and provide a brief view of the data.
      • Also, remember that you can get a similar but more visually appealing view by simply clicking on the object in the “Environment” window.
      • Your R studio session should now look a lot like this:
      YouthData <- read_spss(here("Datasets", "Youth_0.sav"))
      YouthData