The purpose of this fifth assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 5 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.
This chapter covered measures of dispersion, including variation ratio, range, interquartile range, variance, and standard deviation. We use measures of dispersion to summarize the “spread” (rather than central tendency) of a data distribution. Likewise, in this assignment, you will learn how to use R to calculate measures of dispersion and create boxplots that help us standardize and efficiently describe the spread of a data distribution. You will also get additional practice with creating frequency tables and simple graphs in R, and you will learn how to modify some elements (e.g., color) of a ggplot object. As with previous assignments, you will be using R Markdown (with R & R Studio) to complete and submit your work.
sjmisc:frq()
or
summarytools::descr()
)boxplot()
and
ggplot()
to visualize dispersion in a data
distributiongeom_boxplot()
) by adding fill=
and color=
followed by specific color names (e.g.,
“turquoise”) or hexidecimal codes (e.g., “#990000” for crimson;
“#EDEBEB” for cream)+ theme_minimal()
) to a ggplot object to conveniently
modify certain plot elements (e.g., white background color)labs()
function (e.g.,
+ labs(title = "My Title")
)We are building on objectives from Assignments 1-4. By the start of this assignment, you should already know how to:
package::function()
formathaven::read_spss()
and assign it to an R object using an
assignment (<-
) operator$
symbol to call a specific element (e.g., a
variable, row, or column) within an object (e.g., dataframe or tibble),
such as with the format dataobject$varname
%>%
pipe operator to perform a
sequence of actionshere()
for a simple and reproducible
self-referential file directory methodhead()
function to quickly view a
snapshot of your dataglimpse()
function to quickly view all columns
(variables) in your datasjPlot::view_df()
to quickly browse variables in a
data fileattr()
to identify variable and attribute value
labelsNA
for
variables in your data fileselect()
,
mutate()
, and if_else()
functionssummarytools::dfsummary()
to quickly describe one
or more variables in a data filesjmisc:frq()
and
summarytools::freq()
functionssummarytools::freq()
mean()
and
median()
(e.g., mean(data$variable
))summarytools::descr()
functionsgt()
(e.g., head(data) %>% gt()
)ggplot()
functionIf you do not recall how to do these things, review Assignments 1-5.
Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
Goal: Read in Youth Data and Determine Measures of Dispersion
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR-MO-DY for the actual date. E.g., 2023-02-02_Ducate_CRIM5305_Assign06)
In the last assignment, you learned how to identify or calculate measures of central tendency from frequency tables to summarize the most common or “expected” value of a data distribution. In doing so, you learned how to decide which measures of central tendency are most appropriate or useful for summarizing specific variables. In this assignment, you will use frequency tables and boxplots to calculate measures of and visualize dispersion for several variables.
here
package will automatically set our CRIM5305_L folder as the top-level
directory.tidyverse
, haven
, here
,
sjmisc
, sjPlot
, and summarytools
.
YouthData
.
YouthData
. This will call the object and provide a brief
view of the data. (Note: You can get a
similar but more visually appealing view by simply clicking on the
object in the “Environment” window.) Your R studio session should
now look a lot like this:YouthData <- read_spss(here("Datasets", "Youth_0.sav"))
As in the image, you should see 1,272 rows (or observations) and 23 columns (or variables.)
YouthData %>% view_df()
, and hit RUN. Check your Viewer
tab to get a better look at the variable names, labels, and values.
YouthData %>% frq(v77)
to generate a frequency table for
the variable that measures the ‘parental supervision scale.’Goal: Determine Measures of Dispersion for
fropinon
Variable
Now, we are going to generate frequency tables for three variables, use these tables to determine measures of dispersion, and then answer Question 5 on page 145 of your book (i.e., standard deviation, variance, range, minimum value, and maximum value.) These measurements of dispersion will help us to infer meaningful information about spread of these distributions in this sample.
You should have read about how to calculate measures of dispersion by
hand in the book chapter; you can also calculate these directly in R.
For instance, you may have noticed that the frequency table you
generated earlier using sjmisc::frq()
included the standard
deviation (“sd=”) in the output. You may also recall that the
descriptive statistics table you generated in Assignment 4 using
summarytools::descr()
included the standard deviation,
along with the minimum value, maximum value, IQR, and other information.
However, for this part of this assignment, you should be able to
generate the frequency tables in R and then calculate all dispersion
measures by hand. This will help you better understand what the
programs are reporting and how they generated these measures. If you
want to read more about measures of dispersion and how to calculate them
in R, you might want to check out here and here.
fropinon
, delinquency
, and
certain
”
fropinon
variable is a
five-category ordinal measure asking respondents how wrong they think
their friends think it is to steal. Responses range from 1 (always
wrong) to 5 (never wrong). However, the variable is misspelled – instead
of “fropinion” with two i’s, the variable is fropinon
with
one ‘i’. Be sure to spell the variable as it is found in the
dataset when referencing it in R code chunks. Also,remember
that R is case sensitive. So, if you type “fropinion” or
fropinon
instead, R will not be able to find the
variable!YouthData %>% frq(fropinon)
delinquency
and certain
. Before each new R
chunk, create a third-level header titled: “Frequency Table of [Variable
Name]”. For example, when you create the frequency table for the
delinquency
variable, create a third-level header above it
titled “Frequency Table of ‘delinquency’”.sd(data$varname)
where you substitute the name
of the data set for data
and the name of the variable for
varname
Graphical representations can be helpful, especially for determining
distribution (or skew.) They can also help to determine measures of
dispersion, such as range and interquartile range. In the next section,
you will create a boxplot for fropinon.
fropinon
”
boxplot(YouthData$fropinon)
. Recall that the
$
is a base R operator used to reference an element
(variable) within an object (dataset).boxplot()
function we used above creates a
boxplot of any variable. However, with the base R plotting functions, it
is difficult to manipulate and save the boxplot if desired. Rather, we
recommend using the ggplot()
function (from the
ggplot2
package) to generate plots instead. Below, we will
show you how to create a boxplot using ggplot()
, which you
can then customize various properties including its colors, titles, and
layout orientation.fropinon
using ggplot()”YouthData %>% ggplot(aes(fropinon)) + geom_boxplot()
.
ggplot()
is a function included in the
tidyverse
package that allows us to create graphs and
plots.(aes())
function manipulates the aesthetic of the
graph or plot, such as the orientation. For example, plots will orient
to the x-axis by default if you type ggplot(aes(fropinon))
.
If you type ggplot(aes(y=fropinon))
, the plot will be
flipped to the y-axis like the base R boxplot above.geom_boxplot()
function works like the
geom_histogram()
function you used in earlier assignments.
Be sure to include the +
sign before
geom_boxplot()
since you are “adding” this geometric object
layer to the initial XY coordinate plot.
+
sign is on the same
line as the ggplot()
function. Otherwise, R will assume
you’re done with the ggplot()
function, and it will not
understand that you want to add a boxplot to it.fropinon
boxplot”. Then, create a new R chunk and type
YouthData %>% ggplot(aes(fropinon)) + geom_boxplot()
.geom_boxplot
, type
fill = "turquoise", color = "black"
.
fill =
dictates the inner color of the boxplot.
color =
dictates the color or the outline and lines
comprising the boxplot. Be sure to include the quotation marks
(““).YouthData %>%
ggplot(aes(fropinon)) +
geom_boxplot(fill = "turquoise", color = "black")
YouthData %>% ggplot(aes(fropinon)) + geom_boxplot(fill = "turquoise", color = "black")
.+ labs(title = "Boxplot of Friends' Opinions on Stealing")
after geom_boxplot(fill = "turquoise", color = "black")
. If
you break across lines, remember to include the +
at the
end of the previous line and not at the beginning of the new line.
labs()
is a function that allows you to change
labels.title =
designates that you’re working with the boxplot
title.Congratulations! You just learned how to create and modify a (jazzed up) boxplot in R!
Goal: Determine Measures of Dispersion for
delinquency
andcertain
Variable (Question 5, Ch.5 (pp.145))
Now that you can create a boxplot in R, you will create boxplots for
the delinquency
and certain
variables as well.
You will do this using the method from above.
delinquency
and certain
(Question 5, Ch.5
(pp.145))”. Then, create a fourth-level header (type ####
)
titled: “Boxplot for delinquency
”delinquency
variable. Your R studio should look like
this:Now, add colors to the boxplot by typing
fill = "blue", color = "black"
in the parentheses of
geom_boxplot()
. Then, add a title that says “Boxplot of
Number of Delinquent Acts”. To do this, type
+ labs(title = "Boxplot of Number of Delinquent Acts")
after geom_boxplot(fill = "blue", color = "black")
.
Finally, save your boxplot as a PNG file (i.e., an image file).
ggsave()
. Use the code below and name your file
LASTNAME_DelinquencyBoxPlot.png
, replacing LASTNAME with
your own last name.YouthData %>%
ggplot(aes(delinquency)) +
geom_boxplot(fill = "turquoise", color = "black") +
labs(title = "Boxplot of Number of Delinquent Acts")
ggsave(here("Ducate_DelinquencyBoxPlot.png"))
Now repeat this process with the certain
variable.
certain
”You should now have everything that you need to complete the questions in Assignment 6! Remember to:
sd()
,
sjmisc:frq()
, or summarytools::descr()
)?boxplot()
and ggplot()
to visualize dispersion
in a data distribution?geom_boxplot()
) by adding
fill=
and color=
followed by specific color
names (e.g., “turquoise”) or hexidecimal codes (e.g., “#990000” for
crimson)?labs()
function (e.g.,
+ labs(title = "My Title")
)?