In the last assignment, you learned how to conduct a one-sample
*z* or *t* hypothesis test of the difference between a
sample and population mean and then, given the test results and the null
hypothesis, to make an appropriate inference about the population mean
by either rejecting or failing to reject the null hypothesis. In this
assignment, you will learn how to make population inferences about the
relationship between two categorical variables by conducting a
chi-squared test of independence on a sample contingency table
(crosstab).

- recognize you can manually build a simple tibble row-by-row using
tidyverse’s
`tibble::tribble()`

- recognize you can use
`round()`

to specify number of decimals on numeric values - recognize you can modify a
`gt()`

table to add or remove the decimals in specific columns or rows with`fmt_number()`

- be able to conduct a chi-squared test of independence using
`sjPlot::sjtab()`

or`chisq.test()`

and interpret results- know how to specify different measures of association with
`statistics=`

using`sjPlot::sjtab()`

and interpret them appropriately. - know how to generate observed and expected frequencies by assigning
results of
`chisq.test()`

to an object (e.g.,`chisq`

) and then calling elements from object (e.g.,`chisq$observed`

or`chisq$expected`

)

- know how to specify different measures of association with

We are building on objectives from Assignments 1-8. By the start of this assignment, you should already know how to:

- create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
- knit your RMD document into an HTML file that you can then save and submit for course credit
- install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
`package::function()`

format - read in an SPSS data file in an R code chunk using
`haven::read_spss()`

and assign it to an R object using an assignment (`<-`

) operator

- use the
`$`

symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format`dataobject$varname`

- use a tidyverse
`%>%`

pipe operator to perform a sequence of actions - recognize the R operator
`!=`

as “not equal to” - turn off or change scientific notation in R, such as
`options(scipen=999, digits = 3)`

- create a list or vector and assign to object, such as
`listname <- c(item1, item2)`

- recognize that use
`lapply()`

to create a list of objects, which can help you avoid cluttering the R Environment with objects - recognize that you can create your own R functions (e.g., our
`funxtoz()`

function) - and that doing so is recommended for duplicate tasks to avoid copy-and-paste errors

- use
`here()`

for a simple and reproducible self-referential file directory method - improve reproducibility of randomization tasks in R by setting the
random number generator seed using
`set.seed()`

- know that you can share examples or troubleshoot code in a
reproducible way by using built-in datasets like
`mtcars`

that are universally available to R users

- use the base R
`head()`

function to quickly view the first few rows of data - use the base R
`tail()`

function to quickly view the last few rows of data - use the
`glimpse()`

function to quickly view all columns (variables) in your data - use
`sjPlot::view_df()`

to quickly browse variables in a data file - use
`attr()`

to identify variable and attribute value labels - recognize when missing values are coded as
`NA`

for variables in your data file - remove missing observations from a variable in R when appropriate
using
`filter(!is.na(var))`

- change a numeric variable to a factor (e.g., nominal or ordinal)
variable with
`haven::as_factor()`

- select and recode variables using dplyr’s
`select()`

,`mutate()`

, and`if_else()`

functions - convert raw column (variable) values into standardized z-score
values using
`mutate()`

- select random sample from data without or with replacement using
`dplyr::sample_n()`

- select data with conditions using
`dplyr::filter()`

and`%in%`

operator - simulate data from normal, truncated normal, or uniform probability
distributions using
`rnorm()`

,`truncnorm::rtruncnorm()`

, or`runif()`

- draw random samples from data in R
- draw one random sample with
`dplyr::slice_sample()`

- draw multiple (“replicate”) random samples with
`infer::rep_slice_sample()`

- draw one random sample with

- use
`summarytools::dfsummary()`

to quickly describe one or more variables in a data file - create frequency tables with
`sjmisc:frq()`

and`summarytools::freq()`

functions - sort frequency distributions (lowest to highest/highest to lowest)
with
`summarytools::freq()`

- calculate measures of central tendency for a frequency distribution
- calculate central tendency using base R functions
`mean()`

and`median()`

(e.g.,`mean(data$variable`

)) - calculate central tendency and other basic descriptive statistics
for specific variables in a dataset using
`summarytools::descr()`

and`psych::describe()`

functions

- calculate central tendency using base R functions
- calculate measures of dispersion for a variable distribution
- calculate dispersion measures by hand from frequency tables you generate in R
- calculate some measures of dispersion (e.g., standard deviation)
directly in R (e.g., with
`sjmisc:frq()`

or`summarytools::descr()`

)

- recognize and read the basic elements of a contingency table (aka
crosstab)
- place IV in columns and DV in rows of a crosstab
- recognize column/row marginals and their overlap with univariate frequency distributions
- calculate marginal, conditional, and joint (frequentist) probabilities
- compare column percentages (when an IV is in columns of a crosstab)

- generate and modify a contingency table (crosstab) in R with
`dplyr::select()`

&`sjPlot::sjtab(depvar, indepvar)`

#### Data visualization & aesthetics - improve some knitted tables by piping a function’s results to
`gt()`

(e.g.,`head(data) %>% gt()`

)- modify elements of a
`gt()`

table, such as adding titles/subtitles with Markdown-formatted (e.g.,`**bold**`

or`*italicized*`

) fonts

- modify elements of a
- create basic graphs using ggplot2’s
`ggplot()`

function- generate simple bar charts and histograms to visualize shape and central tendency of a frequency distribution
- generate boxplots using base R
`boxplot()`

and`ggplot()`

to visualize dispersion in a data distribution

- modify elements of a ggplot object
- change outline and fill colors in a ggplot geometric object (e.g.,
`geom_boxplot()`

) by adding`fill=`

and`color=`

followed by specific color names (e.g., “orange”) or hexidecimal codes (e.g., “#990000” for crimson; “#EDEBEB” for cream) - add or change a preset theme (e.g.,
`+ theme_minimal()`

) to a ggplot object to conveniently modify certain plot elements (e.g., white background color) - add a title (and subtitle or caption) to a ggplot object by adding a
label with the
`labs()`

function (e.g.,`+ labs(title = "My Title")`

)

- change outline and fill colors in a ggplot geometric object (e.g.,
- be able to combine multiple
`ggplot()`

plots into a single figure using “patchwork” package- know how to customize plot layout and add title to patchwork figure
- recognize that one can write a custom function to repeatedly generate similar plots before combining them with patchwork
- recognize you can use
`patchwork::wrap_plots()`

to quickly combine ggplot objects contained in a list

- recognize that one can plot means and confidence intervals using
`ggplot() + geom_point() + geom_errorbars`

- recognize that one can add elements like vertical lines
(
`+ geom_vline()`

), arrows (`+ geom_segment`

), or text (`+ annotate()`

) elements to a`ggplot()`

object

- conduct and interpret a null hypothesis significance test
- specify null (test) hypothesis & identify contrasting alternative hypothesis (or hypotheses)
- set an alpha or significance level (e.g., as risk tolerance or false positive error control rate)
- calculate a test statistic and corresponding
*p*-value - compare a test
*p*-value to alpha level and then determine whether the evidence is sufficient to reject the null hypothesis or should result in a failure to reject the null hypothesis

- conduct a bimomial hypothesis test in R with
`rstatix::binom_test()`

- generate and interpret confidence intervals to quantify uncertainty
caused by sampling variability
- identify
*t*or*z*critical values associated with a two-tailed confidence level using`qt()`

or`qnorm()`

- estimate the standard error of a sample mean or proportion in R
- estimate a two-tailed confidence interval around a sample mean or proportion in R
- properly interpret and avoid common misinterpretations of confidence intervals

- identify
- be able to conduct a one-sample
*z*or*t*hypothesis test of the difference between a sample and assumed population mean in R and interpret results- be able to conduct a one-sample test using the base R
`t.test()`

function - be able to manually calculate a
*z*or*t*test statistic by typing formula in R - be able to conduct a one-sample test using
`infer::t_test()`

- know how to use
`infer::visualize()`

to visualize where your sample statistic would fall in the sampling distribution associated with your null hypothesis

- be able to conduct a one-sample test using the base R

**If you do not recall how to do these things, review
Assignments 1-10.**

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

- contingency tables or crosstabs
- joint frequency distribution
- column marginal or column frequency
- row marginal or row frequency
- how to compare percentage differences (across IV and within DV categories)

- chi-squared test of independence
- observed frequency
- expected frequency
- how to calculate with the definitional and computational formulas

- measures of association
- positive and negative relationships
- how to calculate and when to use phi-coefficient, contingency
coefficient, Cramer’s
*V*(e.g., table size, levels of measurement) - how to calculate and when to use proportionate reduction in error (PRE) measures of association including lambda, Goodman & Kruskal’s gamma, or Yule’s Q (e.g., table size, levels of measurement)

As noted previously, for this and all future assignments, you MUST
type all commands in by hand. *Do not copy & paste except for
troubleshooting purposes (i.e., if you cannot figure out what you
mistyped).*

Goal: Understand the null hypothesis of statistical independence and visualize chi-squared test of independence

A few assignments back (Assignment 7), you learned how to describe
the association between two categorical variables by creating and
interpreting a contingency table or crosstab. In this assignment, you
will learn how to make an inference about the relationship between two
variables in a population by conducting a chi-squared (\(\chi^2\)) test of independence on a sample
crosstab. Additionally, we will briefly introduce you to the
phi-coefficient and Cramer’s *V*, two measures of association
that can be interpreted to describe the strength of an association
between variables in a crosstab.

*Note:* You might notice that your book uses “chi-square” yet
we use “chi-squared” instead (with a “d” at the end). Which
term is correct? It does not really matter as long as you realize we
are referring to the same statistical quantity.

In Assignment 7, we explained how to set up a crosstab with the independent variable (IV) in the columns and dependent variable (DV) in the rows and then how to describe the association - or lack of association - between the IV and DV by