The purpose of this third assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 2 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.
These chapters focused on data distributions aggregation. As with the two previous assignments, you will be using R Markdown (with R & RStudio) to complete and present your work. In this assignment, you will learn how to recode variables, generate frequency tables, and create simple graphs in R.
sjmisc::frq()
and
summarytools::freq()
frq()
and
freq()
for creating frequency tablessummarytools::dfsummary()
as another way to
quickly describe one or more variables in a data fileWe are building on objectives from Assignments 1 & 2. By the start of this assignment, you should already know how to:
package::function()
formathaven::read_spss()
and assign it to an R object using an
assignment (<-
) operator$
symbol to call a specific element (e.g., a
variable, row, or column) within an object (e.g., dataframe or tibble),
such as with the format dataobject$varname
%>%
pipe operator to perform a
sequence of actionshere()
for a simple and reproducible
self-referential file directory methodsjPlot::view_df()
to quickly browse variables in a
data fileattr()
to identify variable and attribute value
labelsIf you do not recall how to do these things, first review Assignments 1 & 2.
Additionally, you should have read the assigned book chapters and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:
As noted previously, for this and all future assignments, you MUST type all commands in by hand. Do not copy & paste except for troubleshooting purposes (i.e., if you cannot figure out what you mistyped).
Goal: Create a new RMD file for Assignment 3
(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR-MO-DY for the actual date. E.g., 2023-01-25_Ducate_CRIM5305_Assign03)
In the second assignment, you learned how to read in and assign a
dataset to an R object. You also learned how to use the
view_df
function from the sjPlot
package and
the base R attr()
function to display your dataframe and
identify variable attributes. In this third assignment, you will use the
“sjmisc” and “summarytools” packages to display your descriptive data in
frequency tables. You will also learn about the dfsummary()
function from the “summarytools” package, which is an alternative to
sjPlot::view_df
for creating a useful summary of all or a
subset of the variables in a dataset.
here
package will automatically set our CRIM5305_L folder as the top-level
directory.Goal: Read in 2012 States Data and view variable information
tidyverse
, haven
, here
,
sjmisc
, sjPlot
, and
summarytools
.StatesData2012
.
StatesData2012
.
StatesData2012 <- read_spss(here("Datasets", "2012StatesData.sav"))
StatesData2012
## # A tibble: 50 × 30
## State Numbe…¹ Numbe…² South Region Permo…³ cigtax smoke…⁴ tobac…⁵ Persm…⁶
## <chr> <dbl> <dbl> <dbl+l> <dbl+l> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alaba… 335 1981 1 [Sou… 1 [Sou… 15.1 0.425 0 318. 22
## 2 Alaska 54 216 0 [Non… 2 [Wes… 21.5 2 0 270. 22
## 3 Arizo… 443 607 0 [Non… 2 [Wes… 19.7 2 1 247. 16
## 4 Arkan… 200 1269 1 [Sou… 1 [Sou… 17.6 1.15 1 324. 22
## 5 Calif… 2766 1882 0 [Non… 2 [Wes… 15.6 0.87 0 235 14
## 6 Color… 349 668 0 [Non… 2 [Wes… 18.2 0.84 1 238. 18
## 7 Conne… 228 418 0 [Non… 4 [Nor… 11 3 0 238. 16
## 8 Delaw… 63 156 1 [Sou… 1 [Sou… 14.1 1.6 1 281. 18
## 9 Flori… 1229 1712 1 [Sou… 1 [Sou… 15.9 1.34 1 259. 17
## 10 Georg… 680 2322 1 [Sou… 1 [Sou… 16.5 0.37 0 299. 19
## # … with 40 more rows, 20 more variables: totalpop <dbl>, DivorceRt <dbl>,
## # perfampoverty <dbl>, perindpoverty <dbl>, MedianIncome <dbl>,
## # Pernoinsurance <dbl>, MurderRt <dbl>, RobberyRt <dbl>, AssaultRt <dbl>,
## # BurglaryRt <dbl>, MVTheftRT <dbl>, InfantMort <dbl>, HeartDeathRt <dbl>,
## # CancerDeathRt <dbl>, PerBachelorD <dbl>, PercentRural <dbl>,
## # Percent18to24 <dbl>, ID <dbl>, Assault_bin <dbl+lbl>, Murdercat <dbl+lbl>,
## # and abbreviated variable names ¹Number1824, ²NumberRural, ³Permoved, …
Now, let’s view the variables in the data. In the SPSS program
referenced in the book, one would click on the “variable view” tab.
Recall, one way to see the variables in your data is to simply
click the data object in your R environment (“StatesData2012”).
This will open another window in which you can see your variables and
every row of observations (akin to “data view” in SPSS). Recall,
however, for a “variable view” equivalent in R, you can use the
sjPlot::view_df()
function:
StatesData2012 %>% view_df()
and hit RUN or
press CMD/CTRL + EnterNow, refer back to B&P’s SPSS Exercise at the end of Chapter 2 (pages 41-42) and answer the questions, which ask about the unit of analysis and the following variables:
Goal: Use R to create frequency tables for the Murdercat variable (Questions 11-13, Ch.2 (pp.41-42).
In Chapter 2, you learned about levels of measurement and about how frequency tables are used in descriptive research. While there are many different ways to describe variables, frequency tables are one of the most basic and efficient way to do so. Frequency tables describe the number of occurrences in our data for each variable attribute or for grouped variable attributes. There are many ways to generate frequency tables, and we will only cover a couple of them here.
Suppose we want to generate a frequency table for the “Murdercat” (or
murder rate categorical) variable. Here is a simple way to get that
using the frq()
function from the “sjmisc” package:
StatesData2012 %>% frq(Murdercat)
frq()
command from the “sjmisc”
package. frq()
displays a basic frequency table of the
designated variable(s). The table should show the value labels (e.g., “0
to 3 murders per 100k”; “3.1 to 6 murders per 100k”; “6.1 to 9 murders
per 100k”; etc.) and the N, or the total number of units (states) in the
dataset. It also shows the percentage of states within each attribute
value, including the cumulative percentage.As noted, this is one way to create a frequency table, but there are lots of other packages that we could use instead. Each package and function has its various strengths. For examples, see here.
While the above frequency table is easy to generate and has the
descriptive information we need, it is not easy to sort the table
output. So, we will also introduce you to the freq()
function from the ” `“summarytools” package. This package is described
in the link above and in more detail here
StatesData2012 %>% freq(Murdercat)
.
freq()
, not
frq()
. Both functions (with or without the ‘e’) will work
as long as their respective packages are loaded: “summarytools” for
freq()
and “sjmisc” for frq()
.data %>% freq(variable)
and
data %>% frq(variable)
code (with and without an ‘e’) to
see for yourself. One difference is that, unlike the
sjmisc::frq()
output, the summarytools::freq()
output does not include a variable’s attribute value labels (e.g., “0 to
3 murders per 100k”).summarytools::freq()
function is that it allows us to easily sort a frequency table, such as
by highest to lowest frequencies or vice versa. For example, if we want
to sort the frequencies from highest to lowest frequencies with high
values on top, we can use freq(variable, order = "freq")
.
StatesData2012 %>% freq(Murdercat, order = "freq")
. This
will sort the frequencies of Murdercat from lowest to highest.freq
object
(data %>% freq(variable, order = "-freq")
. This means
the frequencies will be sorted from lowest to highest. Sort the
Murdercat variable from lowest to highest frequencies and complete the
rest of the SPSS exercises on page 42.#sorted by frequency, high to low
StatesData2012 %>%
freq(Murdercat, order = "freq")
## Frequencies
## StatesData2012$Murdercat
## Label: Murder rate categorical
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 1 21 42.00 42.00 42.00 42.00
## 0 17 34.00 76.00 34.00 76.00
## 2 10 20.00 96.00 20.00 96.00
## 3 1 2.00 98.00 2.00 98.00
## 4 1 2.00 100.00 2.00 100.00
## <NA> 0 0.00 100.00
## Total 50 100.00 100.00 100.00 100.00
#sorted by frequency, low to high
StatesData2012 %>%
freq(Murdercat, order = "-freq")
## Frequencies
## StatesData2012$Murdercat
## Label: Murder rate categorical
## Type: Numeric
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ----------- ------ --------- -------------- --------- --------------
## 3 1 2.00 2.00 2.00 2.00
## 4 1 2.00 4.00 2.00 4.00
## 2 10 20.00 24.00 20.00 24.00
## 0 17 34.00 58.00 34.00 58.00
## 1 21 42.00 100.00 42.00 100.00
## <NA> 0 0.00 100.00
## Total 50 100.00 100.00 100.00 100.00
plain.ascii = FALSE, style = 'rmarkdown'
after
order = "freq"
to your freq()
code. This
should generate a table with RMD text formatting rather than the default
plain text output (e.g., with asterisks and vertical lines
throughout):#clean it up for knitted RMD
StatesData2012 %>%
freq(Murdercat, order = "freq", plain.ascii = FALSE, style = 'rmarkdown')
## ### Frequencies
## #### StatesData2012$Murdercat
## **Label:** Murder rate categorical
## **Type:** Numeric
##
## | | Freq | % Valid | % Valid Cum. | % Total | % Total Cum. |
## |-----------:|-----:|--------:|-------------:|--------:|-------------:|
## | **1** | 21 | 42.00 | 42.00 | 42.00 | 42.00 |
## | **0** | 17 | 34.00 | 76.00 | 34.00 | 76.00 |
## | **2** | 10 | 20.00 | 96.00 | 20.00 | 96.00 |
## | **3** | 1 | 2.00 | 98.00 | 2.00 | 98.00 |
## | **4** | 1 | 2.00 | 100.00 | 2.00 | 100.00 |
## | **\<NA\>** | 0 | | | 0.00 | 100.00 |
## | **Total** | 50 | 100.00 | 100.00 | 100.00 | 100.00 |
summarytools
forum
above or try knitting your document without these additions and viewing
the changes before and after these additions.You should now have everything that you need to complete rest of the questions in Assignment 3 that parallel those from B&P’s SPSS Exercises for Chapter 2!
sjmisc::frq()
and
summarytools::freq()
?frq()
and
freq()
for creating frequency tables?summarytools::dfsummary()
as another way to
quickly describe one or more variables in a data file?