The purpose of this fifth assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 5 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

This chapter covered measures of dispersion, including variation ratio, range, interquartile range, variance, and standard deviation. We use measures of dispersion to summarize the “spread” (rather than central tendency) of a data distribution. Likewise, in this assignment, you will learn how to use R to calculate measures of dispersion and create boxplots that help us standardize and efficiently describe the spread of a data distribution. You will also get additional practice with creating frequency tables and simple graphs in R, and you will learn how to modify some elements (e.g., color) of a ggplot object. As with previous assignments, you will be using R Markdown (with R & R Studio) to complete and submit your work.

- be able to calculate measures of dispersion by hand from frequency tables you generate in R
- be able to generate some measures of dispersion (e.g., standard
deviation) directly in R (e.g., with
`sjmisc:frq()`

or`summarytools::descr()`

) - be able to generate boxplots using base R
`boxplot()`

and`ggplot()`

to visualize dispersion in a data distribution - know how to change outline and fill colors in a ggplot geometric
object (e.g.,
`geom_boxplot()`

) by adding`fill=`

and`color=`

followed by specific color names (e.g., “turquoise”) or hexidecimal codes (e.g., “#990000” for crimson; “#EDEBEB” for cream) - know how to add or change a preset theme (e.g.,
`+ theme_minimal()`

) to a ggplot object to conveniently modify certain plot elements (e.g., white background color) - be able to add a title (and subtitle or caption) to a ggplot object
by adding a label with the
`labs()`

function (e.g.,`+ labs(title = "My Title")`

)

We are building on objectives from Assignments 1-4. By the start of this assignment, you should already know how to:

- create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
- install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
`package::function()`

format - read in an SPSS data file in an R code chunk using
`haven::read_spss()`

and assign it to an R object using an assignment (`<-`

) operator - use the
`$`

symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format`dataobject$varname`

- use a tidyverse
`%>%`

pipe operator to perform a sequence of actions - knit your RMD document into an HTML file that you can then save and submit for course credit

- use
`here()`

for a simple and reproducible self-referential file directory method

- use the base R
`head()`

function to quickly view a snapshot of your data - use the
`glimpse()`

function to quickly view all columns (variables) in your data - use
`sjPlot::view_df()`

to quickly browse variables in a data file - use
`attr()`

to identify variable and attribute value labels - recognize when missing values are coded as
`NA`

for variables in your data file - select and recode variables using dplyr’s
`select()`

,`mutate()`

, and`if_else()`

functions

- use
`summarytools::dfsummary()`

to quickly describe one or more variables in a data file - create frequency tables with
`sjmisc:frq()`

and`summarytools::freq()`

functions - sort frequency distributions (lowest to highest/highest to lowest)
with
`summarytools::freq()`

- calculate measures of central tendency for a variable distribution
using base R functions
`mean()`

and`median()`

(e.g.,`mean(data$variable`

)) - calculate central tendency and other basic descriptive statistics
for specific variables in a dataset using
`summarytools::descr()`

functions

- improve some knitted tables by piping a function’s results to
`gt()`

(e.g.,`head(data) %>% gt()`

) - create basic graphs using ggplot2’s
`ggplot()`

function

*If you do not recall how to do these things, review Assignments
1-5.*

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

- measures of dispersion, such as variation ratio, range, interquartile range (IQR), variance, and standard deviation
- the difference between range and IQR
- the relationship between variance and standard deviation
- how to calculate range, variation ratio, and IQR
- how to calculate variance of a population, a sample, and a sample with grouped data
- how to calculate standard deviation of a population, a sample, and a sample with grouped data
- how to calculate sample variance and standard deviation with ungrouped and grouped data using computational formulas
- boxplots, including steps for boxplot construction, elements of a boxplot, and how to read a boxplot to summarize the central tendency and dispersion of a data distribution

As noted previously, for this and all future assignments, you MUST
type all commands in by hand. *Do not copy & paste except for
troubleshooting purposes (i.e., if you cannot figure out what you
mistyped).*

Goal: Read in Youth Data and Determine Measures of Dispersion

*( Note:*

In the last assignment, you learned how to identify or calculate measures of central tendency from frequency tables to summarize the most common or “expected” value of a data distribution. In doing so, you learned how to decide which measures of central tendency are most appropriate or useful for summarizing specific variables. In this assignment, you will use frequency tables and boxplots to calculate measures of and visualize dispersion for several variables.

- Go to your CRIM5305_L folder, which should contain the R Markdown
file you created for Assignment 5 (named
**YEAR-MO-DY_LastName_CRIM5305_Assign05**). Click to open the R Markdown file.- Remember, we open RStudio in this way so the
`here`

package will automatically set our CRIM5305_L folder as the top-level directory. - In RStudio, open a new R Markdown document. If you do not recall how to do this, refer to Assignment 1.
- The dialogue box asks for a
**Title**, an**Author**, and a**Default Output Format**for your new R Markdown file. - In the
**Title**box, enter*CRIM5305 Assignment 6*. - In the
**Author**box, enter your First and Last Name (e.g.,*Caitlin Ducate*). - Under
**Default Output Format**box, select “Word document”

- Remember, we open RStudio in this way so the
- Remember that the new R Markdown file contains a simple
pre-populated template to show users how to do basic tasks like add
settings, create text headings and text, insert R code chunks, and
create plots. Be sure to delete this text before you begin working.
- Create a second-level header titled: “Part 1 (Assignment 6.1).”
- This assignment must be completed by the student and the student
alone. To confirm that this is your work, please begin all assignments
with this text: This R Markdown document contains
*my work*for Assignment 6. It is**my work**andmy work.*only*

- Create a third-level header in R Markdown (hereafter, “RMD”) file
titled: “Load Libraries”
- Insert an R chunk.
- Inside the new R code chunk, load the following packages:
`tidyverse`

,`haven`

,`here`

,`sjmisc`

,`sjPlot`

, and`summarytools`

.- Recall, you only need to install packages one time. However, you
must load them each time you start a new R session.

- Recall, you only need to install packages one time. However, you
must load them each time you start a new R session.

- After your first code chunk, create another third-level header in
RMD titled: “Read Data into R”
- Insert another R code chunk.
- In the new R code chunk, read and assign the “Youth_0.sav” SPSS
datafile into an R data object named
`YouthData`

.- Forget how to do this? Refer to Assignment 1.

- In the same code chunk, on a new line below your read data/assign
object command, type the name of your new R data object:
`YouthData`

. This will call the object and provide a brief view of the data.*(***Note:***You can get a similar but more visually appealing view by simply clicking on the object in the “Environment” window.)*Your R studio session should now look a lot like this:

`YouthData <- read_spss(here("Datasets", "Youth_0.sav"))`

As in the image, you should see 1,272 rows (or observations) and 23 columns (or variables.)

- Now, insert an R chunk, type
`YouthData %>% view_df()`

, and hit RUN. Check your Viewer tab to get a better look at the variable names, labels, and values.- Forget how to do this? Refer to Assignment 2.

- Forget how to do this? Refer to Assignment 2.
- Create a third-level header titled: “Frequency Table for ‘v77’
Variable”
- Create a new R code chunk and type
`YouthData %>% frq(v77)`

to generate a frequency table for the variable that measures the ‘parental supervision scale.’ **Note:***R is case sensitive! Be sure you are typing “v77” with a lower-case, not upper-case, ‘v’.*- Your frequency table should look like this:

- Create a new R code chunk and type

- Using this frequency table, calculate the variation ratio of the
variable and answer question 7 of Assignment 6.
**REMEMBER**: You can use R as a calculator. In fact, you can write a line of code that will calculate the value for you. If you go this route, remember to follow the order of operations (e.g., use parentheses in the right places). See my walk-through video if you are curious how to do this.

Goal: Determine Measures of Dispersion for`fropinon`

Variable

Now, we are going to generate frequency tables for three variables, use these tables to determine measures of dispersion, and then answer Question 5 on page 145 of your book (i.e., standard deviation, variance, range, minimum value, and maximum value.) These measurements of dispersion will help us to infer meaningful information about spread of these distributions in this sample.

You should have read about how to calculate measures of dispersion by
hand in the book chapter; you can also calculate these directly in R.
For instance, you may have noticed that the frequency table you
generated earlier using `sjmisc::frq()`

included the standard
deviation (“sd=”) in the output. You may also recall that the
descriptive statistics table you generated in Assignment 4 using
`summarytools::descr()`

included the standard deviation,
along with the minimum value, maximum value, IQR, and other information.
**However, for this part of this assignment, you should be able to
generate the frequency tables in R and then calculate all dispersion
measures by hand.** This will help you better understand what the
programs are reporting and how they generated these measures. If you
want to read more about measures of dispersion and how to calculate them
in R, you might want to check out here and here.

- Create a second-level header titled: “Part 2 (Assignment 6.2).”
Then, create a third-level header titled: “Calculate Measures of
Dispersion for
`fropinon`

,`delinquency`

, and`certain`

”**Note:**The`fropinon`

variable is a five-category ordinal measure asking respondents how wrong they think their friends think it is to steal. Responses range from 1 (always wrong) to 5 (never wrong). However, the variable is misspelled – instead of “fropinion” with two i’s, the variable is`fropinon`

with one ‘i’.**Be sure to spell the variable as it is found in the dataset when referencing it in R code chunks**. Also,remember that R is case sensitive. So, if you type “fropinion” or`fropinon`

instead, R will not be able to find the variable!- Also note that this is why learning to use RStudio’s ability to
autocomplete is valuable. As long as I type the first few letters of a
variable, it will complete the name for me, ensuring my variable is
typed correctly. Remember that you can use TAB to autocomplete.

- Create a new R code chunk and type
`YouthData %>% frq(fropinon)`

- Repeat the above step for the other 2 variables,
`delinquency`

and`certain`

. Before each new R chunk, create a third-level header titled: “Frequency Table of [Variable Name]”. For example, when you create the frequency table for the`delinquency`

variable, create a third-level header above it titled “Frequency Table of ‘delinquency’”. **NOTE**: If you just want to calculate the standard deviation,`sd(data$varname)`

where you substitute the name of the data set for`data`

and the name of the variable for`varname`

- Then, answer questions 9-12

- Repeat the above step for the other 2 variables,

Graphical representations can be helpful, especially for determining
distribution (or skew.) They can also help to determine measures of
dispersion, such as range and interquartile range. In the next section,
you will create a boxplot for fropinon.

- Create a third-level header titled: “Basic Boxplot of
`fropinon`

”- Insert an R chunk. You can create a simple boxplot using base R by
typing
`boxplot(YouthData$fropinon)`

. Recall that the`$`

is a base R operator used to reference an element (variable) within an object (dataset). - Your R studio should look like this:

- Insert an R chunk. You can create a simple boxplot using base R by
typing

- The base R
`boxplot()`

function we used above creates a boxplot of any variable. However, with the base R plotting functions, it is difficult to manipulate and save the boxplot if desired. Rather, we recommend using the`ggplot()`

function (from the`ggplot2`

package) to generate plots instead. Below, we will show you how to create a boxplot using`ggplot()`

, which you can then customize various properties including its colors, titles, and layout orientation.

- Create a third-level header titled: “Boxplot of
`fropinon`

using ggplot()”

- Insert another R chunk and type
`YouthData %>% ggplot(aes(fropinon)) + geom_boxplot()`

.- Recall,
`ggplot()`

is a function included in the`tidyverse`

package that allows us to create graphs and plots. - The
`(aes())`

function manipulates the aesthetic of the graph or plot, such as the orientation. For example, plots will orient to the x-axis by default if you type`ggplot(aes(fropinon))`

. If you type`ggplot(aes(y=fropinon))`

, the plot will be flipped to the y-axis like the base R boxplot above. - The
`geom_boxplot()`

function works like the`geom_histogram()`

function you used in earlier assignments. Be sure to include the`+`

sign before`geom_boxplot()`

since you are “adding” this geometric object layer to the initial XY coordinate plot.**Note:**If you break your code into multiple lines (as pictured below,) be sure that the`+`

sign is on the same line as the`ggplot()`

function. Otherwise, R will assume you’re done with the`ggplot()`

function, and it will not understand that you want to add a boxplot to it.

- Your R studio should look like this:

- Recall,

- Next, we can add some color.
- Create a third-level header titled: “Add Color to
`fropinon`

boxplot”. Then, create a new R chunk and type`YouthData %>% ggplot(aes(fropinon)) + geom_boxplot()`

. - Inside the paratheses after
`geom_boxplot`

, type`fill = "turquoise", color = "black"`

.`fill =`

dictates the inner color of the boxplot.`color =`

dictates the color or the outline and lines comprising the boxplot. Be sure to include the quotation marks (““).

- Create a third-level header titled: “Add Color to

```
YouthData %>%
ggplot(aes(fropinon)) +
geom_boxplot(fill = "turquoise", color = "black")
```