Assignment 3 Objectives
The
purpose of this third assignment is to help you use R to complete some
of the SPSS Exercises from the end of Chapters 2 and 3 in Bachman,
Paternoster, & Wilson’s Statistics for Criminology
& Criminal Justice, 5th Ed.
These chapters focused on data distributions and displaying data with
tabular or graphical representations. As with the two previous
assignments, you will be using R Markdown (with R & RStudio) to
complete and present your work. In this assignment, you will learn how
to recode variables, generate frequency tables, and create simple graphs
in R.
By the end of assignment #3, you should be able to…
- create simple frequency tables using
sjmisc::frq()
and
summarytools::freq()
- identify strengths and limitations of
frq()
and
freq()
for creating frequency tables
- sort a frequency table by frequency, from highest to lowest and from
lowest to highest frequencies
- recognize
summarytools::dfsummary()
as another way to
quickly describe one or more variables in a data file
- use the
ggplot()
function from “ggplot2” package to
generate basic bar charts and histograms
- select specific variables using
dplyr::select()
- recode variables using
mutate()
and
if_else()
functions from the “dplyr” package
- understand how the
if_else()
function works and why we
use it instead of ifelse()
Assumptions & Ground
Rules
We
are building on objectives from Assignments 1 & 2. By the start of
this assignment, you should already know how to:
Basic R/RStudio skills
- create an R Markdown (RMD) file and add/modify text, level headers,
and R code chunks within it
- install/load R packages and use hashtags (“#”) to comment out
sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
package::function()
format
- read in an SPSS data file in an R code chunk using
haven::read_spss()
and assign it to an R object using an
assignment (<-
) operator
- use the
$
symbol to call a specific element (e.g., a
variable, row, or column) within an object (e.g., dataframe or tibble),
such as with the format dataobject$varname
- use a tidyverse
%>%
pipe operator to perform a
sequence of actions
- knit your RMD document into an HTML file that you can then save and
submit for course credit
Reproducibility
- use
here()
for a simple and reproducible
self-referential file directory method
- Use
groundhog.library()
as an optional but recommended
reproducible alternative to library()
for loading
packages
Data viewing & wrangling
- use
sjPlot::view_df()
to quickly browse variables in a
data file
- use
attr()
to identify variable and attribute value
labels
If you do not recall how to do these things, first review
Assignments 1 & 2.
Additionally, you should have read the assigned book chapters and
reviewed the SPSS questions that correspond to this assignment, and you
should have completed any other course materials (e.g., videos;
readings) assigned for this week before attempting this R assignment. In
particular, for this week, I assume you understand:
- units of analysis
- variable levels of measurement
- skewness
- rates, percents, proportions, intervals, and interval widths
- appropriate graphs for different types of variables (e.g., at
different levels of measurement)
- difference between histograms and bar charts and when each is
appropriate
- difference between histograms and line graphs and when each is
appropriate
As noted previously, for this and all future assignments, you MUST
type all commands in by hand. Do not copy & paste except for
troubleshooting purposes (i.e., if you cannot figure out what you
mistyped).
- Early on, you may have a lot of trouble getting your code to run due
to minor typos. This is normal.
- Remember, you are learning to read and write a new (coding)
language. As with learning any new languages, we learn from practice -
and from correcting our mistakes.
Part 1 (Assignment 3.1)
Goal: Create a new
RMD file for Assignment 3
(Note: Remember that, when
following instructions, always substitute “LastName” for your own last
name and substitute YEAR_MO_DY for the actual date. E.g.,
2022_09_01_Fordham_K300_Assign3)
In the second assignment, you learned how to read in and assign a
dataset to an R object. You also learned how to use the
view_df
function from the sjPlot
package and
the base R attr()
function to display your dataframe and
identify variable attributes. In this third assignment, you will use the
“sjmisc” and “summarytools” packages to display your descriptive data in
frequency tables. You will also learn about the dfsummary()
function from the “summarytools” package, which is an alternative to
sjPlot::view_df
for creating a useful summary of all or a
subset of the variables in a dataset. Additionally, you will learn how
to select and recode variables using the select()
,
mutate()
, and if_else
functions from the
“dplyr” package, and how to display your data in basic bar charts or
histograms using the ggplot()
function from the “ggplot2”
package.
- Go to your K300_L folder, which should contain the R Markdown file
you created for Assignment 2 (named
YEAR_MO_DY_LastName_K300Assign2). Click to open the R
Markdown file.
- Remember, we open RStudio in this way so the
here
package will automatically set our K300_L folder as the top-level
directory.
- In RStudio, open a new R Markdown document. If you do not recall how
to do this, refer to Assignment 1.
- The dialogue box asks for a Title, an
Author, and a Default Output Format
for your new R Markdown file.
- In the Title box, enter K300 Assignment
3.
- In the Author box, enter your First and Last Name
(e.g., Tyeisha Fordham).
- Under Default Output Format box, be sure “HTML” is
selected (HTML is usually the default selection)
- Remember that the new R Markdown file contains a simple
pre-populated template to show users how to do basic tasks like add
settings, create text headings and text, insert R code chunks, and
create plots. Be sure to delete all the text after
the YAML header before you begin working.
- This assignment must be completed by the student and the student
alone. To confirm that this is your work, please begin all assignments
with this text:
- This R Markdown document contains my work for Assignment 3.
It is my work and only my
work.
Part 2 (Assignment 3.2)
- Create a second-level heading titled: “Part 1 (Assignment 3.1):
Reading in and viewing 2012 States Data”
- Remember, a second-level heading starts with two hashtags followed
by a space and the heading title, like this: ## Heading Title
- A third-level heading starts with three hashtags: ### Heading
Title
- A fourth-level heading starts with four hashtags: #### Heading
Title
- Now, you need to get data into RStudio. You already know how to do
this, but please refer to Assignment 1 if you have questions.
- Create a third-level header in R Markdown (hereafter, “RMD”) file
titled: “Load Libraries”
- Insert an R code chunk
- Inside the new R code chunk, load the following six packages:
tidyverse
, haven
, here
,
sjmisc
, sjPlot
, and
summarytools
.
- Some of these packages will need to be installed. Remember, you only
need to install a package once, but you must load a package each time
you start a new R session and need to use the package.
- After your first code chunk, create another third-level header in
RMD titled: “Read Data into R”
- Insert another R code chunk.
- In the new R code chunk, read and assign the “2012 states data.sav”
SPSS datafile into an R data object named
StatesData2012
.
- Forget how to do this? Refer to instructions in Assignment 1.
- In the same code chunk, on a new line below your read data/assign
object command, type the name of your new R data object:
StatesData2012
.
- This will call the object and provide a brief view of the data.
(Note: You can also simply click
on the data object in the “Environment” window.)
- Your R studio session should now look a lot like this: