Assignment 8 Objectives
The purpose of this eighth assignment is to review past
material and ensure you have mastered it.
This assignment is a little bit different. Unlike past assignments,
you will not be learning new statistical or coding techniques but rather
will be required to apply what you’ve learned. As such, this assignment
will provide very little guidance. Instead, I expect you to return to
past assignments to guide your progression through this one. For
instance, if I ask for you to calculate the mean of a variable, you
should reference Assignment 5, which walked you through calculations for
measures of central tendency.
By the end of Assignment 8, you should have mastered how to…
- open and set up an RMarkdown file from scratch (being sure to set
the correct working directory)
- load in required packages using the
library
command
- read in a dataset using the
haven
package
- calculate frequency tables and establish levels of measurement for
multiple variables
- plot variables as a bar graph using the
ggplot2
package
- calculate the mean, median, and mode for any given variable
- calculate the standard deviation or variation ratio for any given
variable
- convert raw scores to z-scores
Part 1 (Assignment 8.1)
Goal: Read in YRBS, Youth, and States data
(Note: Remember that, when following
instructions, always substitute “LastName” for your own last name and
substitute YEAR-MO-DY for the actual date. E.g.,
2023-02-02_Ducate_CRIM5305_Assign08)
- Open RStudio and create a new RMarkdown file for Assignment 8
- I recommend you close RStudio and open it fresh by opening it via
your previous assignment (see past assignments for why we do it this
way)
- Create a second-level header titled: “Part 1 (Assignment 8.1).”
- This assignment must be completed by the student and the student
alone. To confirm that this is your work, please begin all assignments
with this text: This R Markdown document contains my work for
Assignment 8. It is my work and
only my work.
- Create a third-level header in R Markdown (hereafter, “RMD”) file
titled: “Load Libraries”
- Insert an R chunk and load in all packages you think will be
necessary for this assignment.
- Remember: if you try to run a command and get an error message such
as
Error in read_spss() : could not find function "read_spss"
,
it almost certainly means you failed to load a required package
- After your first code chunk, create another third-level header in
RMD titled: “Read Data into R”
- Insert another R code chunk.
- In the new R code chunk, read and assign the
Youth_0.sav
data into an R object called
YouthData
, the 2013 YRBS.sav
data into an
object called YRBSData
, and the
2012 states data.sav
data into an object called
StatesData
- In the same code chunk, on a new line below your read data/assign
object command, type the name of your new R data object to call it and
provide a brief view of the data.
Part 2 (Assignment 8.2)
Goal: Describe the distribution of variables
- Create a second-level header titled: Part 2 (Assignment 8.2)
- Create a third-level header titled: Descriptive Statistics for Age
in Youth Dataset
- Create a fourth-level header titled: Frequency Table
Create a new code chunk and generate a frequency table for the
age of the participant in the Youth dataset (Hint: look
at the labels of variables to determine which one reports participant
age)
Remove the NAs (if there are any) by adding
remove.na = TRUE
or show.na = FALSE
; which
argument you use will depend on which command you use to generate your
frequency tables (remember, we learned 2 primary ones)
- Create another fourth-level header titled: Measures of Central
Tendency
- Create a new code chunk and calculate the median and the mean for
age of the participant in the Youth dataset. Though there are many ways
to do this, use the base-R functions (e.g.,
median
).
- Create another fourth-level header titled: Measures of Dispersion
- Create a new code chunk and calculate the standard deviation for age
of the participant in the Youth dataset. Though there are many ways to
do this, use the base-R function
sd()
.
- Repeat the above two steps for the variables
sex
,
qn24
, q14
, and q18
. in the
YRBS dataset. Be sure to include the appropriate third- and
fourth-level headers.
- Note: Items labeled range: 1-2 are coded
as 1 = Yes and 2 = No
- NOTE: There is missing data in these
variables. By default, base-R functions will produce a result of
NA
for any variable with missing data. To make it calculate
your measures without the missing data, add the argument
na.rm = TRUE
.
Part 3 (Assignment 8.3)
Goal: Create Cross-Tabulations of grade
(grade
) and lifetime alcohol use
(qn41
)variables
- Create a second-level header titled: Part 3 (Assignment 8.2)
- Create a third-level header titled: Cross-Tabulation of Grade and
Alcohol Use
- Insert an R chunk and create a cross-tabulation table of the
variables
grade
as an independent variable and
qn41
(How many days they drank alcohol in their life]) as a
dependent variable. Be sure to put your IV in the columns and give the
table a reasonable title.
Part 4 (Assignment 8.4)
Goal: Calculating Z-score and Creating Histogram for
AssaultRt
Variable
- Create a second-level header titled: Part 4 (Assignment 8.4).
- Create a third-level header titled: Creating Histogram for AssaultRt
Variable.
- Insert an R chunk and create a histogram of the
AssaultRt
variable.
- Create a new column (variable) with the
AssaultRt
variable values converted into standardized z-score values.
- Create a third-level header titled: “Converting
AssaultRt
Values to Z-scores”
- Insert an R chunk, then select only the columns we need -
State
& AssaultRt
- and assign only these
two variables (columns) into a new data object called
StatesDataSub
- Now, create a new variable called
ZAssaultRt
using the
mutate()
function. You can either write out the z-score
formula ((AssaultRt - mean(AssaultRt)/sd(AssaultRt))
or use
the built-in scale()
function, which does essentially the
same thing.
- Remember, you can use the
View()
command to see your
dataset. You can also use the gt()
command to create a
prettier table output.
- Congratulations! You should now have everything that you need to
complete the questions in Assignment 8 that review the material and
exercises covered in Assignments 1-7. Remember:
- Keep the file clean and easy to follow by using RMD level headings
(e.g., denoted with ## or ###) separating R code chunks, organized by
assignment questions.
- Write plain text after headings and before or after code chunks to
explain what you are doing - such text will serve as useful reminders to
you when working on later assignments!
- Upon completing the assignment, “knit” your final RMD file again and
save the final knitted Word document as:
YEAR_MO_DY_LastName_CRIM5305_Assign08. Submit via
Blackboard in the relevant section for Assignment 8.
Assignment 8 Objective Checks
After completing Assignment 8, do you feel you have mastered…
- opening a new RMarkdown file?
- loading in required packages?
- reading in datasets and assigning them to objects?
- creating frequency tables of individual variables
- calculating descriptive statistics (mean, median, mode, standard
deviation, range, variation ratio)?
- the basic elements of a contingency table (aka crosstab)?
- remove missing observations from frequency tables?
- are you able to generate a crosstab in R using
dplyr::select()
&
sjPlot::sjtab(depvar, indepvar)
?
- do you know how to add a title and column percents to
sjtab()
table and switch output from viewer to html
browser?
- converting raw column (variable) values into standardized z-score
values using
mutate()
?