In the last assignment, you learned how to estimate and interpret
confidence intervals around a point estimate (e.g., sample mean or
proportion). You also learned how to simulate data from basic
probability distributions to help you better understand sampling
variability and the need for interval estimates. In this assignment, you
will learn how to conduct a two-tail *z*-test and *t*-test
and then, given the test results and the null hypothesis, to make an
appropriate inference about the population parameter by either rejecting
or failing to reject the null hypothesis.

Before you conduct your own hypothesis tests, we will first simulate population data from a normal probability distribution, then take random samples from our simulated population data and plot features of these samples. Our aim will be to help you visualize the sampling distribution of a sample mean, which should lead to a better understanding of the underlying mechanisms that allow us to make valid population inferences from samples with null hypothesis significance testing. While we will not expect you to do all of these tasks yourself, by providing our code along the way, we hope these examples will help you gain a better sense of how one might conduct simulations, create handy user-written functions, and generate more complex visualizations in R.

- know how to draw random samples from data in R
- know how to draw one random sample with
`dplyr::slice_sample()`

- know how to draw multiple (“replicate”) random samples with
`infer::rep_slice_sample()`

- know how to draw one random sample with
- recognize that use
`lapply()`

to create a list of objects, which can help you avoid cluttering the R Environment with objects - recognize you can use
`patchwork::wrap_plots()`

to quickly combine ggplot objects contained in a list - be able to use the base R
`tail()`

function to quickly view the last few rows of data - know that you can share examples or troubleshoot code in a
reproducible way by using built-in datasets like
`mtcars`

that are universally available to R users - be able to conduct a one-sample
*z*or*t*hypothesis test in R and interpret results- be able to conduct a one-sample test using the base R
`t.test()`

function - be able to manually calculate a
*z*or*t*test statistic by typing formula in R - be able to conduct a one-sample test using
`infer::t_test()`

- know how to use
`infer::visualize()`

to visualize where your sample statistic would fall in the sampling distribution associated with your null hypothesis

- be able to conduct a one-sample test using the base R

We are building on objectives from Assignments 1-9. By the start of this assignment, you should already know how to:

- create an R Markdown (RMD) file and add/modify text, level headers, and R code chunks within it
- knit your RMD document into an HTML file that you can then save and submit for course credit
- install/load R packages and use hashtags (“#”) to comment out sections of R code so it does not run
- recognize when a function is being called from a specific package
using a double colon with the
`package::function()`

format - read in an SPSS data file in an R code chunk using
`haven::read_spss()`

and assign it to an R object using an assignment (`<-`

) operator - use the
`$`

symbol to call a specific element (e.g., a variable, row, or column) within an object (e.g., dataframe or tibble), such as with the format`dataobject$varname`

- use a tidyverse
`%>%`

pipe operator to perform a sequence of actions - recognize the R operator
`!=`

as “not equal to” - turn off or change scientific notation in R, such as
`options(scipen=999, digits = 3)`

- create a list or vector and assign to object, such as
`listname <- c(item1, item2)`

- recognize that you can create your own R functions (e.g., our
`funxtoz()`

function) - and that doing so is recommended for duplicate tasks to avoid copy-and-paste errors

- use
`here()`

for a simple and reproducible self-referential file directory method - improve reproducibility of randomization tasks in R by setting the
random number generator seed using
`set.seed()`

- use the base R
`head()`

function to quickly view a snapshot of your data - use the
`glimpse()`

function to quickly view all columns (variables) in your data - use
`sjPlot::view_df()`

to quickly browse variables in a data file - use
`attr()`

to identify variable and attribute value labels - recognize when missing values are coded as
`NA`

for variables in your data file - remove missing observations from a variable in R when appropriate
using
`filter(!is.na(var))`

- change a numeric variable to a factor (e.g., nominal or ordinal)
variable with
`haven::as_factor()`

- drop an unused factor level (e.g., missing “Don’t know” label) on a
variable using
`data %>% droplevels(data$variable)`

- select and recode variables using dplyr’s
`select()`

,`mutate()`

, and`if_else()`

functions - convert raw column (variable) values into standardized z-score
values using
`mutate()`

- select random sample from data without or with replacement using
`dplyr::sample_n()`

- select data with conditions using
`dplyr::filter()`

and`%in%`

operator - simulate data from normal, truncated normal, or uniform probability
distributions using
`rnorm()`

,`truncnorm::rtruncnorm()`

, or`runif()`

- use
`summarytools::dfsummary()`

to quickly describe one or more variables in a data file - create frequency tables with
`sjmisc:frq()`

and`summarytools::freq()`

functions - sort frequency distributions (lowest to highest/highest to lowest)
with
`summarytools::freq()`

- calculate measures of central tendency for a frequency distribution
- calculate central tendency using base R functions
`mean()`

and`median()`

(e.g.,`mean(data$variable`

)) - calculate central tendency and other basic descriptive statistics
for specific variables in a dataset using
`summarytools::descr()`

and`psych::describe()`

functions

- calculate central tendency using base R functions
- calculate measures of dispersion for a variable distribution
- calculate dispersion measures by hand from frequency tables you generate in R
- calculate some measures of dispersion (e.g., standard deviation)
directly in R (e.g., with
`sjmisc:frq()`

or`summarytools::descr()`

)

- recognize and read the basic elements of a contingency table (aka
crosstab)
- place IV in columns and DV in rows of a crosstab
- recognize column/row marginals and their overlap with univariate frequency distributions
- calculate marginal, conditional, and joint (frequentist) probabilities
- compare column percentages (when an IV is in columns of a crosstab)

- generate and modify a contingency table (crosstab) in R with
`dplyr::select()`

&`sjPlot::sjtab(depvar, indepvar)`

- improve some knitted tables by piping a function’s results to
`gt()`

(e.g.,`head(data) %>% gt()`

)- modify elements of a
`gt()`

table, such as adding titles/subtitles with Markdown-formatted (e.g.,`**bold**`

or`*italicized*`

) fonts

- modify elements of a
- create basic graphs using ggplot2’s
`ggplot()`

function- generate simple bar charts and histograms to visualize shape and central tendency of a frequency distribution
- generate boxplots using base R
`boxplot()`

and`ggplot()`

to visualize dispersion in a data distribution

- modify elements of a ggplot object
- change outline and fill colors in a ggplot geometric object (e.g.,
`geom_boxplot()`

) by adding`fill=`

and`color=`

followed by specific color names (e.g., “orange”) or hexidecimal codes (e.g., “#990000” for crimson; “#EDEBEB” for cream) ~~add or change a preset theme (e.g.,~~`+ theme_minimal()`

) to a ggplot object to conveniently modify certain plot elements (e.g., white background color)~~select colors from a colorblind accessible palette (e.g., using~~`viridisLite::viridis()`

) and specify them for the outline and fill colors in a ggplot geometric object (e.g.,`geom_boxplot()`

)- add a title (and subtitle or caption) to a ggplot object by adding a
label with the
`labs()`

function (e.g.,`+ labs(title = "My Title")`

)

- change outline and fill colors in a ggplot geometric object (e.g.,
- be able to combine multiple
`ggplot()`

plots into a single figure using “patchwork” package- know how to customize plot layout and add title to patchwork figure
- recognize that one can write a custom function to repeatedly generate similar plots before combining them with patchwork

- recognize that one can plot means and confidence intervals using
`ggplot() + geom_point() + geom_errorbars`

- recognize that one can add elements like vertical lines
(
`+ geom_vline()`

), arrows (`+ geom_segment`

), or text (`+ annotate()`

) elements to a`ggplot()`

object

- conduct and interpret a null hypothesis significance test
- specify null (test) hypothesis & identify contrasting alternative hypothesis (or hypotheses)
- set an alpha or significance level (e.g., as risk tolerance or false positive error control rate)
- calculate a test statistic and corresponding
*p*-value - compare a test
*p*-value to alpha level and then determine whether the evidence is sufficient to reject the null hypothesis or should result in a failure to reject the null hypothesis

- conduct a bimomial hypothesis test in R with
`rstatix::binom_test()`

- generate and interpret confidence intervals to quantify uncertainty
caused by sampling variability
- identify
*t*or*z*critical values associated with a two-tailed confidence level using`qt()`

or`qnorm()`

- estimate the standard error of a sample mean or proportion in R
- estimate a two-tailed confidence interval around a sample mean or proportion in R
- properly interpret and avoid common misinterpretations of confidence intervals

- identify

**If you do not recall how to do these things, review
Assignments 1-9.**

Additionally, you should have read the assigned book chapter and reviewed the SPSS questions that correspond to this assignment, and you should have completed any other course materials (e.g., videos; readings) assigned for this week before attempting this R assignment. In particular, for this week, I assume you understand:

- Sampling variation (aka, sampling variability)
- Hypothesis testing
- Null hypothesis
- Nondirectional & two-tailed hypothesis tests
- Directional & one-tailed hypothesis tests

- One-sample hypothesis tests for population means and proportions and
how to calculate by hand:
- a one-sample
*z*test for the difference between an observed sample mean and a given population mean for large samples - a one-sample
*t*test for the difference between an observed sample mean and a given population mean for small samples samples - a one-sample
*z*test for the difference between an observed sample proportion and a given population proportion for large samples

- a one-sample

As noted previously, for this and all future assignments, you MUST
type all commands in by hand. *Do not copy & paste except for
troubleshooting purposes (i.e., if you cannot figure out what you
mistyped).*

Goal: Understand the sampling distribution of a sample mean as well as the relationship between sample size and sampling variability

After learning about how confidence intervals help us quantify and communicate uncertainty in statistical estimates of population parameters, this week’s course materials focused on making inferences about population parameters by conducting null hypothesis tests. Briefly, this process involves:

- Making assumptions that our sample statistics can be used as meaningful estimates of population parameters that we are interested in.
- Making a baseline assumption about the population, such as by specifying a “null hypothesis” (e.g., of no difference in means, or of a specific population value).
- Selecting an appropriate test distribution (e.g.,
*z*or*t*distribution) - Specifying a alpha level (i.e., a false positive risk tolerance
level or error control rate) and rejection region in the test (e.g.,
*z*or*t*) distribution - Converting our sample statistic into a standardized test statistic using the test