Assignment 8 Objectives

The purpose of this eighth assignment is to review past material and ensure you have mastered it.

This assignment is a little bit different. Unlike past assignments, you will not be learning new statistical or coding techniques but rather will be required to apply what you’ve learned. As such, this assignment will provide very little guidance. Instead, I expect you to return to past assignments to guide your progression through this one. For instance, if I ask for you to calculate the mean of a variable, you should reference Assignment 5, which walked you through calculations for measures of central tendency.

By the end of Assignment 8, you should have mastered how to…

  • open and set up an RMarkdown file from scratch (being sure to set the correct working directory)
  • load in required packages using the library command
  • read in a dataset using the haven package
  • calculate frequency tables and establish levels of measurement for multiple variables
  • plot variables as a bar graph using the ggplot2 package
  • calculate the mean, median, and mode for any given variable
  • calculate the standard deviation or variation ratio for any given variable
  • convert raw scores to z-scores

Part 1 (Assignment 8.1)

Goal: Read in YRBS, Youth, and States data

(Note: Remember that, when following instructions, always substitute “LastName” for your own last name and substitute YEAR-MO-DY for the actual date. E.g., 2023-02-02_Ducate_CRIM5305_Assign08)

  1. Open RStudio and create a new RMarkdown file for Assignment 8
    • I recommend you close RStudio and open it fresh by opening it via your previous assignment (see past assignments for why we do it this way)

  2. Create a second-level header titled: “Part 1 (Assignment 8.1).”
    1. This assignment must be completed by the student and the student alone. To confirm that this is your work, please begin all assignments with this text: This R Markdown document contains my work for Assignment 8. It is my work and only my work.

  3. Create a third-level header in R Markdown (hereafter, “RMD”) file titled: “Load Libraries”
    1. Insert an R chunk and load in all packages you think will be necessary for this assignment.
    2. Remember: if you try to run a command and get an error message such as Error in read_spss() : could not find function "read_spss", it almost certainly means you failed to load a required package
  4. After your first code chunk, create another third-level header in RMD titled: “Read Data into R”
    1. Insert another R code chunk.
    2. In the new R code chunk, read and assign the Youth_0.sav data into an R object called YouthData, the 2013 YRBS.sav data into an object called YRBSData, and the 2012 states data.sav data into an object called StatesData
    3. In the same code chunk, on a new line below your read data/assign object command, type the name of your new R data object to call it and provide a brief view of the data.

Part 2 (Assignment 8.2)

Goal: Describe the distribution of variables

  1. Create a second-level header titled: Part 2 (Assignment 8.2)
  2. Create a third-level header titled: Descriptive Statistics for Age in Youth Dataset
  3. Create a fourth-level header titled: Frequency Table
    • Create a new code chunk and generate a frequency table for the age of the participant in the Youth dataset (Hint: look at the labels of variables to determine which one reports participant age)

    • Remove the NAs (if there are any) by adding remove.na = TRUE or show.na = FALSE; which argument you use will depend on which command you use to generate your frequency tables (remember, we learned 2 primary ones)

  4. Create another fourth-level header titled: Measures of Central Tendency
    • Create a new code chunk and calculate the median and the mean for age of the participant in the Youth dataset. Though there are many ways to do this, use the base-R functions (e.g., median).
  5. Create another fourth-level header titled: Measures of Dispersion
    • Create a new code chunk and calculate the standard deviation for age of the participant in the Youth dataset. Though there are many ways to do this, use the base-R function sd().
  6. Repeat the above two steps for the variables sex, qn24, q14, and q18. in the YRBS dataset. Be sure to include the appropriate third- and fourth-level headers.
    • Note: Items labeled range: 1-2 are coded as 1 = Yes and 2 = No
    • NOTE: There is missing data in these variables. By default, base-R functions will produce a result of NA for any variable with missing data. To make it calculate your measures without the missing data, add the argument na.rm = TRUE.

Part 3 (Assignment 8.3)

Goal: Create Cross-Tabulations of grade (grade) and lifetime alcohol use (qn41)variables

  1. Create a second-level header titled: Part 3 (Assignment 8.2)
  2. Create a third-level header titled: Cross-Tabulation of Grade and Alcohol Use
    1. Insert an R chunk and create a cross-tabulation table of the variables grade as an independent variable and qn41 (How many days they drank alcohol in their life]) as a dependent variable. Be sure to put your IV in the columns and give the table a reasonable title.

Part 4 (Assignment 8.4)

Goal: Calculating Z-score and Creating Histogram for AssaultRt Variable

  1. Create a second-level header titled: Part 4 (Assignment 8.4).
  2. Create a third-level header titled: Creating Histogram for AssaultRt Variable.
    1. Insert an R chunk and create a histogram of the AssaultRt variable.
  3. Create a new column (variable) with the AssaultRt variable values converted into standardized z-score values.
    1. Create a third-level header titled: “Converting AssaultRt Values to Z-scores”
    2. Insert an R chunk, then select only the columns we need - State & AssaultRt - and assign only these two variables (columns) into a new data object called StatesDataSub
  4. Now, create a new variable called ZAssaultRt using the mutate() function. You can either write out the z-score formula ((AssaultRt - mean(AssaultRt)/sd(AssaultRt)) or use the built-in scale() function, which does essentially the same thing.
    1. Remember, you can use the View() command to see your dataset. You can also use the gt() command to create a prettier table output.
  5. Congratulations! You should now have everything that you need to complete the questions in Assignment 8 that review the material and exercises covered in Assignments 1-7. Remember:
    1. Keep the file clean and easy to follow by using RMD level headings (e.g., denoted with ## or ###) separating R code chunks, organized by assignment questions.
    2. Write plain text after headings and before or after code chunks to explain what you are doing - such text will serve as useful reminders to you when working on later assignments!
    3. Upon completing the assignment, “knit” your final RMD file again and save the final knitted Word document as: YEAR_MO_DY_LastName_CRIM5305_Assign08. Submit via Blackboard in the relevant section for Assignment 8.

Assignment 8 Objective Checks

After completing Assignment 8, do you feel you have mastered…

  • opening a new RMarkdown file?
  • loading in required packages?
  • reading in datasets and assigning them to objects?
  • creating frequency tables of individual variables
  • calculating descriptive statistics (mean, median, mode, standard deviation, range, variation ratio)?
  • the basic elements of a contingency table (aka crosstab)?
  • remove missing observations from frequency tables?
  • are you able to generate a crosstab in R using dplyr::select() & sjPlot::sjtab(depvar, indepvar)?
    • do you know how to add a title and column percents to sjtab() table and switch output from viewer to html browser?
  • converting raw column (variable) values into standardized z-score values using mutate()?