The purpose of this second assignment is to help you begin to explore your data in R and to do so within an RMarkdown document. The specific activities were inspired by the SPSS Exercises from the end of Chapter 1 in Bachman and Paternoster’s Statistics for Criminology & Criminal Justice, 4th Ed.
First, you will learn to create an R Markdown file in which to save and present your work for this class.
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR_MO_DY for the actual date. E.g., Brauer_P680_Assign2_2021_08_30)
In the first assignment, you learned about writing and running R code and making comments in an R Script file. You also saw how running certain commands (e.g., read_spss
) from an R Script file will generate results in the RStudio Console and learned how to save the results of such commands into an R object. For Assignment 2, you will learn to create a new R Markdown file, and you will complete the remainder of your assignment in that file.
here
package will automatically set our P680_work folder as the top-level directory.OK
to create your new R Markdown file. It should look like this:
<Enter>
to leave a blank line between the header and the first line of text.<Enter>
and, on the next line (line 9), type: ### Learning R Markdown
<Enter>
to leave another blank line (line 10)First, you need to get data into RStudio. You already know how to do that in an R Script. It is the same process in R Markdown, except you need to add an “R code chunk” into your file.
tidyverse
, haven
, and here
YRBS2013data
YRBS2013data
sjPlot::view_df()
and attr()
functions to view variables and their attributes.The following activities parallel some of the “SPSS Exercises” found at the end of B&P’s Chapter 1, where those authors walk readers through the SPSS program’s “Variable View” feature (e.g., see “Navigating SPSS” in B&P’s Ch. 1, Question 2, on page 21).
If you have used SPSS before, then you might be familiar with its “Variable View” feature, and there may be times when you find yourself missing this feature while working in RStudio. While RStudio does not have a built-in “Variable View” function, we can generate something similar using the view_df()
function from the sjPlot
package. (For additional instructions on view_df() and other methods for describing variables in R, see Martin Chan’s blog on viewing SPSS labels in R.)
sjPlot
package
install.packages("sjPlot")
line afterwards. You do not want to keep installing packages every time you run your R code. Alternatively, some people recommend typing install.packages()
commands directly into the RStudio Console (bottom left of RStudio) or using the install
option under the “Packages” tab (bottom right of RStudio).sjPlot
package libraryview_df()
function will also introduce you to a “pipe” - %>%
- an immensely useful coding element from the tidyverse
package that efficiently links together sequenced actions. In this case, we will call the data object (YRBS2013data
) and then use a pipe to connect it in sequence to the view_df()
function from sjPlot
.
YRBS2013data %>% view_df()
In B&P’s SPSS Exercise 2.ii.1 at the end of Chapter 1, the authors ask readers to use Variable View to describe the variable name, variable label, and value labels for following four variables:
a. Row 2
b. Row 4
c. Row 23
d. Row 45
Using view_df
, you can find the variable name for a given row by referring to the Name column (see “Viewer” tab). For instance, the variable name for Row 5 is “race7.” You can find the variable label by referring to the Label column. For example, the variable label for “race7” is 7-level race variable. Finally, you find value labels (e.g., survey response options) by referring to the Value Labels column. For instance, the value labels for “race” are 1=Am Indian/Alaska Native, 2=Asian, 3=Black or African American, etc., through 7=Multiple - Non-Hispanic.
While the sjPlot::data %>% view_df
function shows all variables in a dataframe (similar to the “Variable View” in SPSS), the attr()
function can be used to describe the attributes of a specific variable.
attr()
function once again will use a pipe to link our data to a specific action. Additionally, it will introduce you to another important coding symbol - the $
- used to call a specific element (e.g., row, column, or variable) within an object (e.g., dataframe or tibble). In this case, we will call a specific variable (race7
) from our data object (YRBS2013data
) like this: YRBS2013data$race7
. We then use a pipe to connect it in sequence to the attr()
function.
attr('label')
typically requests a variable label, whereas the plural attr('labels')
typically request the value labels.race7
variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('label')
.race7
variable, type the following into your new code chunk: YRBS2013data$race7 %>% attr('labels')
You should now have everything that you need to view and describe the variables, variable labels, value labels, and response values in a dataset!
attr(label)
and attr(labels)
functions in code chunks and describe in your markdown text the variable name and variable label (if the variable has both a name and a label) as well as the numeric response values and associated response value labels for the variables in the following four rows: Row 2; Row 4; Row 23; Row 45.
viewdf()
function can be quite useful while working with data in an RMD document, including this code chunk in your knitted document will result in several pages containing a print-out of all the variables in your dataset. We usually do not want this type of stuff in our final clean knitted document. In the first line of that R code chunk, try placing a comma right after the {r
and then typing eval=FALSE
before the closing }
bracket. The option eval = FALSE
means that the code chunk should not be evaluated (i.e., “run”) when knitting the document. There are various other useful code chunk options, such as include = FALSE
, echo = FALSE
, or results = FALSE
. For more information, check out here, here and here.