## The purpose of this fifth assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 5 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

This chapter covered measures of dispersion, including variation ratio, range, interquartile range, variance, and standard deviation. We use measures of dispersion to summarize the “spread” (rather than central tendency) of a data distribution. Likewise, in this assignment, you will learn how to use R to calculate measures of dispersion and create boxplots that help us standardize and efficiently describe the spread of a data distribution. You will also get additional practice with creating frequency tables and simple graphs in R, and you will learn how to modify some elements (e.g., color) of a ggplot object. As with previous assignments, you will be using R Markdown (with R & R Studio) to complete and submit your work.

- be able to calculate measures of dispersion by hand from frequency tables you generate in R
- be able to generate some measures of dispersion (e.g., standard
deviation) directly in R (e.g., with
`sjmisc:frq()`

or`summarytools::descr()`

) - be able to generate boxplots using base R
`boxplot()`

and`ggplot()`

to visualize dispersion in a data distribution - know how to change outline and fill colors in a ggplot geometric
object (e.g.,
`geom_boxplot()`

) by adding`fill=`

and`color=`

followed by specific color names (e.g., “orange”) or hexidecimal codes (e.g., “#990000” for crimson; “#EDEBEB” for cream) - know how to add or change a preset theme (e.g.,
`+ theme_minimal()`

) to a ggplot object to conveniently modify certain plot elements (e.g., white background color) - understand how to select colors from a colorblind accessible palette
(e.g., using
`viridisLite::viridis()`

) and specify them for the outline and fill colors in a ggplot geometric object (e.g.,`geom_boxplot()`

) - be able to add a title (and subtitle or caption) to a ggplot object
by adding a label with the
`labs()`

function (e.g.,`+ labs(title = "My Title")`

)

## We are building on objectives from Assignments 1-4. By the start of this assignment, you should already know how to:

- create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
- install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
`package::function()`

format - read in an SPSS data file in an R code chunk using
`haven::read_spss()`

and assign it to an R object using an assignment (`<-`

) operator

- use the
`$`

symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format`dataobject$varname`

- use a tidyverse
`%>%`

pipe operator to perform a sequence of actions - knit your RMD document into an HTML file that you can then save and submit for course credit

- use
`here()`

for a simple and reproducible self-referential file directory method - Use
`groundhog.library()`

as an optional but recommended reproducible alternative to`library()`

for loading packages

- use the base R
`head()`

function to quickly view a snapshot of your data - use the
`glimpse()`

function to quickly view all columns (variables) in your data - use
`sjPlot::view_df()`

to quickly browse variables in a data file - use
`attr()`

to identify variable and attribute value labels

- recognize when missing values are coded as
`NA`

for variables in your data file - select and recode variables using dplyr’s
`select()`

,`mutate()`

, and`if_else()`

functions

- use
`summarytools::dfsummary()`

to quickly describe one or more variables in a data file - create frequency tables with
`sjmisc:frq()`

and`summarytools::freq()`

functions - sort frequency distributions (lowest to highest/highest to lowest)
with
`summarytools::freq()`

- calculate measures of central tendency for a variable distribution
using base R functions
`mean()`

and`median()`

(e.g.,`mean(data$variable`

)) - calculate central tendency and other basic descriptive statistics
for specific variables in a dataset using
`summarytools::descr()`

and`psych::describe()`

functions

- improve some knitted tables by piping a function’s results to
`gt()`

(e.g.,`head(data) %>% gt()`

)

- create basic graphs using ggplot2’s
`ggplot()`

function

*If you do not recall how to do these things, review Assignments
1-4.*

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

- measures of dispersion, such as variation ratio, range, interquartile range (IQR), variance, and standard deviation
- the difference between range and IQR
- the relationship between variance and standard deviation
- how to calculate range, variation ratio, and IQR
- how to calculate variance of a population, a sample, and a sample with grouped data
- how to calculate standard deviation of a population, a sample, and a sample with grouped data
- how to calculate sample variance and standard deviation with ungrouped and grouped data using computational formulas
- boxplots, including steps for boxplot construction, elements of a boxplot, and how to read a boxplot to summarize the central tendency and dispersion of a data distribution

As noted previously, for this and all future assignments, you MUST
type all commands in by hand. *Do not copy & paste except for
troubleshooting purposes (i.e., if you cannot figure out what you
mistyped).*

## Goal: Read in Youth Data and Determine Measures of Dispersion

*(**Note:**Remember that, when
following instructions, always substitute “LastName” for your own last
name and substitute YEAR_MO_DY for the actual date. E.g.,
2022_06_08_Fordham_K300Assign5)*

In the last assignment, you learned how to identify or calculate measures of central tendency from frequency tables to summarize the most common or “expected” value of a data distribution. In doing so, you learned how to decide which measures of central tendency are most appropriate or useful for summarizing specific variables. In this assignment, you will use frequency tables and boxplots to calculate measures of and visualize dispersion for several variables.

- Go to your K300_L folder, which should contain the R Markdown file
you created for Assignment 4 (named
**YEAR_MO_DY_LastName_K300Assign4**). Click to open the R Markdown file.- Remember, we open RStudio in this way so the
`here`

package will automatically set our K300_L folder as the top-level directory. - In RStudio, open a new R Markdown document. If you do not recall how to do this, refer to Assignment 1.
- The dialogue box asks for a
**Title**, an**Author**, and a**Default Output Format**for your new R Markdown file. - In the
**Title**box, enter*K300 Assignment 5*. - In the
**Author**box, enter your First and Last Name (e.g.,*Tyeisha Fordham*). - Under
**Default Output Format**box, select “HTML document” (HTML is usually the default selection)

- Remember, we open RStudio in this way so the
- Remember that the new R Markdown file contains a simple
pre-populated template to show users how to do basic tasks like add
settings, create text headings and text, insert R code chunks, and
create plots. Be sure to delete this text before you begin working.
- Create a second-level header titled: “Part 1 (Assignment 5.1).” Then, create a third-level header titled: “Read in Youth Data and Determine Measures of Dispersion”
- This assignment must be completed by the student and the student
alone. To confirm that this is your work, please begin all assignments
with this text: This R Markdown document contains
*my work*for Assignment 5. It is**my work**andmy work.*only* - Now, you need to get data into RStudio. You already know how to do
this, but please refer to Assignment 1 if you cannot recall.

- Create a third-level header in R Markdown (hereafter, “RMD”) file
titled: “Load Libraries”
- Insert an R chunk.
- Inside the new R code chunk, load the following packages:
`tidyverse`

,`haven`

,`here`

,`sjmisc`

,`sjPlot`

, and`summarytools`

. In addition, install and load the`viridisLite`

package.- Recall, you only need to install packages one time. However, you
must load them each time you start a new R session. Also, remember that
you can optionally use (and we recommend)
`groundhog.library()`

to improve the reproducibility of your script. - In this assignment, you will learn to customize your ggplot graphs,
including changing the default color scheme to any colors you want. As
we will explain later, the
`viridisLite`

package is helpful for identifying colors that are colorblind accessible.

- Recall, you only need to install packages one time. However, you
must load them each time you start a new R session. Also, remember that
you can optionally use (and we recommend)

- After your first code chunk, create another third-level header in
RMD titled: “Read Data into R”
- Insert another R code chunk.
- In the new R code chunk, read and assign the “Youth_0.sav” SPSS
datafile into an R data object named
`YouthData`

.- Forget how to do this? Refer to Assignment 1.

- In the same code chunk, on a new line below your read data/assign
object command, type the name of your new R data object:
`YouthData`

. This will call the object and provide a brief view of the data.*(**Note:**You can get a similar but more visually appealing view by simply clicking on the object in the “Environment” window.)*Your R studio session should now look a lot like this:

`YouthData <- read_spss(here("Datasets", "Youth_0.sav"))`