## The purpose of this tenth assignment is to help you use R to complete some of the SPSS Exercises from the end of Chapter 10 in Bachman, Paternoster, & Wilson’s Statistics for Criminology & Criminal Justice, 5th Ed.

In the last assignment, you learned how to make statistical
inferences from a contingency table by conducting and appropriately
interpreting chi-squared tests of independence. You also learned how to
estimate some measures of the magnitude of an association in a crosstab,
such as Cramer’s *V* or the phi-coefficient. In this assignment,
you will learn how to find the difference between two independent sample
means in R and conduct an independent sample *t*-test of equality
in population means.

- recognize that you can add inline R code to RMD text with
``r ``

, which can improve reproducibility and accuracy of reporting results by helping avoid typos or copy-and-paste errors and by auto-updating results - recognize that you can type LaTeX-style mathematical equations in
your R Markdown text by using a single
`$`

for inline text equations or two`$$`

to offset equations on a new line.

- be able to filter data using
`datawizard::data_filter()`

- be able to change a numeric variable to a factor variable with
`mutate(newvar = as.factor(oldvar))`

- be able to conduct an independent samples pooled or separate
variance
*t*-test using`t.test()`

and interpret the results- know how to conduct a Levene’s test for equality of variances using
`car::leveneTest()`

- know how to conduct a Levene’s test for equality of variances using
- be able to generate a half-violin/half-dotplot with
`geom_violindot()`

to visualize group distributions and sample sizes simultaneously - know how to use
`aes(fill=groupvar)`

to fill plots with different colors for each category of a grouping variable - know how to manually change the fill colors in a ggplot object using
`scale_fill_manual()`

- know how to add a horizontal or vertical line to a ggplot object
with
`geom_hline()`

or`geom_vline()`

## We are building on objectives from Assignments 1-9. By the start of this assignment, you should already know how to:

- create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
- knit your RMD document into an HTML file that you can then save and submit for course credit
- install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
`package::function()`

format - read in an SPSS data file in an R code chunk using
`haven::read_spss()`

and assign it to an R object using an assignment (`<-`

) operator

- use the
`$`

symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format`dataobject$varname`

- use a tidyverse
`%>%`

pipe operator to perform a sequence of actions - recognize the R operator
`!=`

as “not equal to” - turn off or change scientific notation in R, such as
`options(scipen=999, digits = 3)`

- create a list or vector and assign to object, such as
`listname <- c(item1, item2)`

- recognize that use
`lapply()`

to create a list of objects, which can help you avoid cluttering the R Environment with objects

- recognize that you can create your own R functions (e.g., our
`funxtoz()`

function) - and that doing so is recommended for duplicate tasks to avoid copy-and-paste errors - recognize that
`round()`

can be used to specify number of decimals on numeric values

- use
`here()`

for a simple and reproducible self-referential file directory method - Use
`groundhog.library()`

as an optional but recommended reproducible alternative to`library()`

for loading packages - improve reproducibility of randomization tasks in R by setting the
random number generator seed using
`set.seed()`

- know that you can share examples or troubleshoot code in a
reproducible way by using built-in datasets like
`mtcars`

that are universally available to R users

- use the base R
`head()`

function to quickly view the first few rows of data - use the base R
`tail()`

function to quickly view the last few rows of data - use the
`glimpse()`

function to quickly view all columns (variables) in your data - use
`sjPlot::view_df()`

to quickly browse variables in a data file - use
`attr()`

to identify variable and attribute value labels

- recognize when missing values are coded as
`NA`

for variables in your data file - remove missing observations from a variable in R when appropriate
using
`filter(!is.na(var))`

- change a numeric variable to a factor (e.g., nominal or ordinal)
variable with
`haven::as_factor()`

- drop an unused factor level (e.g., missing “Don’t know” label) on a
variable using
`data %>% droplevels(data$variable)`

- select and recode variables using dplyr’s
`select()`

,`mutate()`

, and`if_else()`

functions - convert raw column (variable) values into standardized z-score
values using
`mutate()`

- select random sample from data without or with replacement using
`dplyr::sample_n()`

- select data with conditions using
`dplyr::filter()`

and`%in%`

operator - simulate data from normal, truncated normal, or uniform probability
distributions using
`rnorm()`

,`truncnorm::rtruncnorm()`

, or`runif()`

- draw random samples from data in R
- draw one random sample with
`dplyr::slice_sample()`

- draw multiple (“replicate”) random samples with
`infer::rep_slice_sample()`

- draw one random sample with
- recognize that one can manually build a simple tibble row-by-row
using tidyverse’s
`tibble::tribble()`

- use
`summarytools::dfsummary()`

to quickly describe one or more variables in a data file - create frequency tables with
`sjmisc:frq()`

and`summarytools::freq()`

functions - sort frequency distributions (lowest to highest/highest to lowest)
with
`summarytools::freq()`

- calculate measures of central tendency for a frequency distribution
- calculate central tendency using base R functions
`mean()`

and`median()`

(e.g.,`mean(data$variable`

)) - calculate central tendency and other basic descriptive statistics
for specific variables in a dataset using
`summarytools::descr()`

and`psych::describe()`

functions

- calculate central tendency using base R functions
- calculate measures of dispersion for a variable distribution
- calculate dispersion measures by hand from frequency tables you generate in R
- calculate some measures of dispersion (e.g., standard deviation)
directly in R (e.g., with
`sjmisc:frq()`

or`summarytools::descr()`

)

- recognize and read the basic elements of a contingency table (aka
crosstab)
- place IV in columns and DV in rows of a crosstab
- recognize column/row marginals and their overlap with univariate frequency distributions
- calculate marginal, conditional, and joint (frequentist) probabilities
- compare column percentages (when an IV is in columns of a crosstab)

- generate and modify a contingency table (crosstab) in R with
`dplyr::select()`

&`sjPlot::sjtab(depvar, indepvar)`

or with`crosstable(depvar, by=indepvar)`

- improve some knitted tables by piping a function’s results to
`gt()`

(e.g.,`head(data) %>% gt()`

)- modify elements of a
`gt()`

table, such as adding titles/subtitles with Markdown-formatted (e.g.,`**bold**`

or`*italicized*`

) fonts

- recognize that a
`gt()`

table can be modified to add or remove the decimals in specific columns or rows with`fmt_number()`

- modify elements of a
- create basic graphs using ggplot2’s
`ggplot()`

function- generate simple bar charts and histograms to visualize shape and central tendency of a frequency distribution
- generate boxplots using base R
`boxplot()`

and`ggplot()`

to visualize dispersion in a data distribution

- modify elements of a ggplot object
- change outline and fill colors in a ggplot geometric object (e.g.,
`geom_boxplot()`

) by adding`fill=`

and`color=`

followed by specific color names (e.g., “orange”) or hexidecimal codes (e.g., “#990000” for crimson; “#EDEBEB” for cream) - add or change a preset theme (e.g.,
`+ theme_minimal()`

) to a ggplot object to conveniently modify certain plot elements (e.g., white background color) - select colors from a colorblind accessible palette (e.g., using
`viridisLite::viridis()`

) and specify them for the outline and fill colors in a ggplot geometric object (e.g.,`geom_boxplot()`

) - add a title (and subtitle or caption) to a ggplot object by adding a
label with the
`labs()`

function (e.g.,`+ labs(title = "My Title")`

)

- change outline and fill colors in a ggplot geometric object (e.g.,
- be able to combine multiple
`ggplot()`

plots into a single figure using “patchwork” package- know how to customize plot layout and add title to patchwork figure
- recognize that one can write a custom function to repeatedly generate similar plots before combining them with patchwork
- recognize you can use
`patchwork::wrap_plots()`

to quickly combine ggplot objects contained in a list

- recognize that one can plot means and confidence intervals using
`ggplot() + geom_point() + geom_errorbars`

- recognize that one can add elements like vertical lines
(
`+ geom_vline()`

), arrows (`+ geom_segment`

), or text (`+ annotate()`

) elements to a`ggplot()`

object

- conduct and interpret a null hypothesis significance test
- specify null (test) hypothesis & identify contrasting alternative hypothesis (or hypotheses)
- set an alpha or significance level (e.g., as risk tolerance or false positive error control rate)
- calculate a test statistic and corresponding
*p*-value - compare a test
*p*-value to alpha level and then determine whether the evidence is sufficient to reject the null hypothesis or should result in a failure to reject the null hypothesis

- conduct a bimomial hypothesis test in R with
`rstatix::binom_test()`

- generate and interpret confidence intervals to quantify uncertainty
caused by sampling variability
- identify
*t*or*z*critical values associated with a two-tailed confidence level using`qt()`

or`qnorm()`

- estimate the standard error of a sample mean or proportion in R
- estimate a two-tailed confidence interval around a sample mean or proportion in R
- properly interpret and avoid common misinterpretations of confidence intervals

- identify
- be able to conduct a one-sample
*z*or*t*hypothesis test of the difference between a sample and assumed population mean in R and interpret results- be able to conduct a one-sample test using the base R
`t.test()`

function - be able to manually calculate a
*z*or*t*test statistic by typing formula in R - be able to conduct a one-sample test using
`infer::t_test()`

- know how to use
`infer::visualize()`

to visualize where your sample statistic would fall in the sampling distribution associated with your null hypothesis

- be able to conduct a one-sample test using the base R
- conduct a chi-squared test of independence using
`sjPlot::sjtab()`

or`chisq.test()`

and interpret results- specify different measures of association with
`statistics=`

using`sjPlot::sjtab()`

and interpret them appropriately. - generate observed and expected frequencies by assigning results of
`chisq.test()`

to an object (e.g.,`chisq`

) and then calling elements from object (e.g.,`chisq$observed`

or`chisq$expected`

)

- specify different measures of association with

*If you do not recall how to do these things, review Assignments
1-9.*

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readi