The purpose of this second assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 1 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.
Following Assignment 1, you will create an R Markdown file in which to save and present your work for this class. Additionally, for this assignment, you will use R/RStudio to view variables in a data file and find information about variables, including variable names, labels, and values. This assignment should help you navigate R/RStudio and become comfortable finding basic information within data files.
view_df()
function from the “sjPlot” package to
quickly browse variables in a data file%>%
) coding operator (from
the “magrittr” package) to link together sequenced actions, such as call
a data object then apply a function to a variable in that data
objectattr()
function from base R to identify variable
and attribute value labels$
symbol can be used to call a
specific element (e.g., a variable, row, or column) within an object
(e.g., dataframe or tibble)NA
for
variables in your data filepackage::function()
format)We are building on Assignment 1 objectives. By the start of this assignment, you should already know how to:
here()
for a simple and reproducible
self-referential file directory methodgroundhog.library()
as an optional but recommended
reproducible alternative to library()
for loading
packagesIf you do not recall how to do these things, first review Assignment 1.
Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
Goal: Create new R Markdown file in which to complete your Assignment 2.
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., 2022_05_20_Fordham_K300Assign2_RMD)
In the first assignment, you learned how to create a new R Markdown
file and use it to write and run R code, and make comments. You also saw
how running certain commands (e.g., read_spss
) from an R
Markdown file will generate results in the RStudio Console and learned
how to assign the results of such commands into an R object. In
Assignment 2, you will learn how to read in and assign datasets as an R
object. You will also learn how to use the sjPlot
package
to quickly view variables with its view_df()
function and
to use the base R attr()
function to identify variable
labels and variable attribute value labels.
here
package automatically sets our K300_L
folder as the top-level working directory.Goal: Read data and assign to R object.
tidyverse
, haven
, and here
.
install.packages("tidyverse")
in the R console.
Alternatively, you can type that into an R chunk - just remember to
comment out the command after running it (by adding a “#” in front of
it).YRBS2013data
.
YRBS2013data
. This will call the object and provide a brief
view of the data. (Note: You can get a
similar but more visually appealing view by simply clicking on the
object in the “Environment” window. More on this later.) Your R
studio session should now look a lot like this:Goal: Use
sjPlot::view_df()
andattr()
functions to complete “SPSS Exercises” at the end of B&P’s Ch.1 (pp.20-21).
This week’s assignment will ask questions that parallel those found in the SPSS exercises at the end of B&P’s Chapter 1. In this section, you will learn about a couple functions that will help you answer these questions.
First, refer to B&P’s Chapter 1, Question 2, on page 21
(“Navigating SPSS), which refers to a”Variable View in SPSS.” While
RStudio does not have a built-in “Variable View” like the one found in
SPSS, we can generate something similar using the view_df()
function from the sjPlot
package. Additionally, with this
function, you should be able to answer these questions. (For additional
instructions on view_df() and other methods for describing variables in
R, see Martin Chan’s blog
on viewing SPSS labels in R.)
sjPlot
package
install.packages("sjPlot")
line
afterwards. You do not want to keep installing packages every time you
run your R code. Alternatively, some people recommend typing
install.packages()
commands directly into the RStudio
Console (bottom left of RStudio) or using the install
option under the “Packages” tab (bottom right of RStudio).sjPlot
package
libraryview_df()
function will
also introduce you to a “pipe” - %>%
- an immensely
useful coding element from the tidyverse
package that
efficiently links together sequenced actions. In this case, we will call
the data object (YRBS2013data
) and then use a pipe to
connect it in sequence to the view_df()
function from
sjPlot
.
YRBS2013data %>% view_df()
Now, refer back to B&P’s SPSS Exercise 2.ii.1 at the end of Chapter 1, which asks about the following four variables: a. Row 2 b. Row 4 c. Row 23 d. Row 45
Using view_df
, you can answer questions about the
variable name for a given row by referring to the
Name column (see “Viewer” tab). For instance, the
variable name for Row 5 is “race7.” You can also answer questions about
the variable label by referring to the Label
column. For example, the variable label for “race7” is 7-level
race variable. Finally, you can answer questions about
value labels (e.g., survey response options) by referring to
the Value Labels column. For instance, the value labels
for “race” are 1=Am Indian/Alaska Native, 2=Asian,
3=Black or African American, etc., through 7=Multiple -
Non-Hispanic.
While the sjPlot::data %>% view_df()
function shows
all variables in a dataframe (similar to the “Variable View” in
SPSS), the attr()
function can be used to describe the
attributes of a specific variable.
::
). Since R is open-source technology, it is common to
have user-written packages that rely on the same commands to call their
functions. For instance, in the next assignment, you will learn about
the select()
function from the “dplyr” package. The term
“select” is quite common, so the select()
command may have
conflicts across packages. One way to ensure that you are calling the
function from the package that you want is by specifically calling the
package first, followed by a double colon and then the function, using
the following format: package::function()
.attr()
function once
again will use a pipe to link our data to a specific action.
Additionally, it will introduce you to another important coding symbol -
the $
- used to call a specific element (e.g., row, column,
or variable) within an object (e.g., dataframe or tibble). In this case,
we will call a specific variable (race7
) from our data
object (YRBS2013data
) like this:
YRBS2013data$race7
. We then use a pipe to connect it in
sequence to the attr()
function.
attr('label')
typically requests a variable label, whereas the plural
attr('labels')
typically request the value labels.race7
variable, type the following into your new code
chunk: YRBS2013data$race7 %>% attr('label')
.race7
variable, type the following into
your new code chunk:
YRBS2013data$race7 %>% attr('labels')
You should now have everything that you need to complete the questions in Assignment 2 that parallel those from B&P’s SPSS Exercises for Chapter 1!
view_df()
from the “sjPlot” package to view
variable information in a data file?%>%
) coding operator (from
the “magrittr” package) to call a data object then apply a function
(e.g., view_df
) to a variable in that data object?attr()
function to view variable and attribute
value labels?$
symbol can be used to call a
specific element (e.g., a variable, row, or column) within an object
(e.g., dataframe or tibble)?NA
for
variables in your data file?package::function()
format)?