The purpose of this fourth assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapters 3 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.
These chapters focused on data distributions and displaying data with tabular or graphical representations. As with the previous assignments, you will be using R Markdown (with R & RStudio) to complete and present your work. In this assignment, you will learn how to recode variables, generate frequency tables, and create simple graphs in R.
ggplot()
function from ggplot2
package to generate basic bar charts and histogramsmutate()
and
if_else()
functions from the dplyr
packageif_else()
function worksggsave()
We are building on objectives from Assignments 1-3. By the start of this assignment, you should already know how to:
package::function()
formathaven::read_spss()
and assign it to an R object using an
assignment (<-
) operator$
symbol to call a specific element (e.g., a
variable, row, or column) within an object (e.g., dataframe or tibble),
such as with the format dataobject$varname
%>%
pipe operator to perform a
sequence of actionshere()
for a simple and reproducible
self-referential file directory methodsjPlot::view_df()
to quickly browse variables in a
data fileattr()
to identify variable and attribute value
labelsIf you do not recall how to do these things, first review Assignments 1, 2, & 3.
Additionally, you should have read the assigned book chapters and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
Goal: Create a new RMD file for Assignment 4
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR-MO-DY for the actual date. E.g., 2022-09-01_Ducate_CRIM5305_Assign3)
In the second assignment, you learned how to read in and assign a
dataset to an R object. You also learned how to use the
view_df
function from the sjPlot
package and
the base R attr()
function to display your dataframe and
identify variable attributes. In the third assignment, you learned to
use the sjmisc
and summarytools
packages to
display your descriptive data in frequency tables. You also learned
about the dfsummary()
function from the
summarytools
package, which is an alternative to
sjPlot::view_df
for creating a useful summary of all or a
subset of the variables in a dataset.
In this fourth assignment, you will be reminded how to display your
descriptive data in frequency tables. Additionally, you will learn how
to select and recode variables using the select()
,
mutate()
, and if_else
functions from the
“dplyr” package, and how to display your data in basic bar charts or
histograms using the ggplot()
function from the “ggplot2”
package.
here
package will automatically set our CRIM5305_L folder as the top-level
directory.Goal: Read in and Identify Characteristics of Lone Offender Assault NCVS Data
We will be working with the 1992 to 2013 NCVS Lone Assault data,
which details individual experiences with criminal victimization. You’ll
begin by reading this dataset in and displaying the variable view using
sjPlot::view_df()
.
Then, you will need to answer the questions regarding levels of measurement and graphs on Assignment 4. To answer these questions, you will need to view the “injured”, “maleoff”, “age_r”, and “V2129” variables. That is what we will do next
Create a second-level header titled: “Part 2 (Assignment 4.2)”
Remember, a second-level heading starts with two hashtags followed by a space and the heading title, like this: ## Heading Title
A third-level heading starts with three hashtags: ### Heading Title
A fourth-level heading starts with four hashtags: #### Heading
Title
Now, you need to get data into RStudio. You already know how to do this, but please refer to Assignment 1 if you have questions.
First, we need to load in our libraries.
Create a third-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
Insert an R code chunk
Inside the new R code chunk, load the following six packages:
tidyverse
, haven
, here
,
sjmisc
, sjPlot
, and
summarytools
.
You should have all of these packages installed, but if you
don’t, please install them using the install.packages()
command. Remember, you only need to install a package once, but you must
load a package each time you start a new R session and need to use the
package.
After your first code chunk, create another third-level header in RMD titled: “Read Data into R”
Insert another R code chunk.
In the new R code chunk, read and assign the “NCVS lone offender
assaults 1992 to 2013.sav” SPSS datafile into an R data object named
NCVS1992to2013
.
NCVS1992to2013 <- read_spss(here("Datasets", "NCVS lone offender assaults 1992 to 2013.sav"))
In the same code chunk, on a new line below your read data/assign
object command, type the name of your new R data object:
NCVS1992to2013
.
Now create a third-level header titled: Describing “injured”, “maleoff”, “age_r”, and “V2129” variables
NCVS1992to2013
. View the variable summary in the “Viewer”
tab using data %>% view_df()
. Your “Viewer” tab in
RStudio should look like this:b. You should now create a frequency table for each variable to determine the level of measurement: whether a variable is numeric or alphanumeric, binary, rank-ordered, etc.
- Create separate R code chunks for each frequency table, and include headers (e.g., fourth level header: "Frequency table for"injured" variable") above each table so we can easily tell what the table is.
- **Don't remember how to make frequency tables?** Try `NCVS1992to2013 %>% freq(VARIABLE)`, where VARIABLE is replaced by the name of the variable (e.g., `NCVS1992to2013 %>% freq(injured)`)
NCVS1992to2013 %>% freq(age_r)
).NCVS1992to2013 %>% ggplot(aes(injured)) + geom_bar()
.NCVS1992to2013 %>% ggplot(aes(injured)) + geom_histogram()
.NCVS1992to2013 %>%
ggplot(aes(age_r)) +
geom_histogram()
ggplot()
is a function in the
ggplot2
package (which, like haven
and
dplyr
, is part of the tidyverse) that allows us to create
graphs and plots. We will cover some basic options for editing elements
of a ggplot object in later assignments. For now, here are a few things
to note:
(aes())
function manipulates the aesthetic of the
graph or plot, such as the orientation. In essense, this is the part of
the code that sets up the XY background for your plot. For example,
plots will orient to the x-axis by default if you type
ggplot(aes(variable))
as we did above. Alternatively, if
you type ggplot(aes(y=variable))
, the plot aesthetic will
change by flipping its orientation to the y-axis.geom_bar()
or geom_histogram()
to the object.
To do this, we literally “add” the geometric object layer to the XY
coordinate plot by including a +
sign before it.
data %>% ggplot(aes(variable)) + geom_type
+
sign is on the same line as the
ggplot()
function. Otherwise, R will assume you’re done
with the ggplot()
function, and it will not understand that
you want to add a geometric object to it.Goal: Recode and Create Frequency Table for “Vic18andoverbin” Variable
In the remainder of the exercise, we are interested in the “age_r” variable and determining the proportion of victims who experienced assaults before they were 18. You can do this by recoding the variable and then creating a frequency table, which will display proportions or percentages along with frequencies for your recoded variable.
mutate()
function
in the dplyr
package. We will also use the
if_else()
function, which represents a ‘yes or no’ test
within R.
ifelse()
and if_else()
. Be sure to use
if_else()
, the one WITH the underscore ( _ ).Insert an R chunk and type
NCVS1992to2013 <- NCVS1992to2013 %>% mutate(Vic18andoverbin = if_else(age_r < 18, 0, 1))
.
mutate()
function recodes the “age_r” variable
according to our if_else()
(i.e., true, false
)
logic statement. If “age_r” (the original variable) is less than 18,
then R assigns a value of “0” to our new variable. If “age_r” is not
less than 18 (i.e., if the value of the variable, or the victim’s age,
is 18 or older and is not missing), then R assigns a value of “1” to the
new variable. The new binary indicator variable is named
Vic18andoverbin
.NCVS1992to2013 %>% view_df()
to see that
the range of the variable, Vic18andoverbin, is 0-1.NCVS1992to2013 <- NCVS1992to2013 %>%
mutate(
# if_else() codes values as 0 if victims were under 18 and 1 if
# they were over 18
Vic18andoverbin = if_else(age_r < 18, 0, 1)
)
Congratulations! You just recoded your first variable into a meaningful and informative binary variable.
For the last part of the assignment, lets visualize this variable by creating a bar chart.
data %>% ggplot(aes(variable)) + geom_bar()
. Remember to
call the correct data object and variable (i.e., you assigned your new
variable into a new data object)! In this case, your code should be
NCVS1992to2013 %>% ggplot(aes(Vic18andoverbin)) + geom_bar()
ggsave()
. Use the code below and name your
file LASTNAME_Vic18andoverbin.png
, replacing LASTNAME with
your own last name.Vic18andoverplot <- NCVS1992to2013 %>%
ggplot(aes(Vic18andoverbin)) +
geom_bar()
ggsave(here("Ducate_Vic18andoverbin.png"))
You should now have everything that you need to complete the questions in Assignment 4 that parallel those from B&P’s SPSS Exercises for Chapter 3!
ggplot()
function from “ggplot2” package to
generate basic bar charts and histograms?mutate()
and
if_else()
functions from the “dplyr” package?ggsave()